摘要
为在保证中文歧义包容和长度限制的同时提高中文全文检索的速率,在现有中文分词算法的基础上,提出了一种改进的中文分词算法。通过在算法中建立索引的过程建立文本中相关词与词库的映射,对词库进行改造,使之更好地与相关词进行映射,以便于实现中文分词。实验证明,改进的中文分词算法能降低检索耗时,是已有的分词算法的1/2和1/5,有效提高中文全文检索的速率。
In order to raise the rate of Chinese text retrieval of Chinese full-text retrieval system on the basis of the Chinese ambiguity inclusiveness and length restrictions, a kind of improved Chinese word segmentation algorithm is proposed based on the existing Chinese word segmentation algorithms. In this improved algorithm the process of indexing establishment makes a mapping from related words to the thesaurus. Through the improvement to the thesaurus, a better mapping is realized. Experiments show that the improved algorithm is a more efficient Chinese text retrieval segmentation algorithm.
出处
《吉林大学学报(信息科学版)》
CAS
2013年第3期320-323,共4页
Journal of Jilin University(Information Science Edition)
基金
吉林省教育厅科技发展规划基金资助项目(2012373)
关键词
中文全文检索
中文分词
字索引
Chinese full-text retrieval
Chinese segmentation
character-based indexing