摘要
为了提高基于Lucene中文检索系统的检索精度和效率,通过分析Lucene的结构,在系统中加入了中文分词模块和索引文档预处理模块。给出了具体的实验方法和实验过程,对改进原理和实验数据进行了分析,表明了加入中文分词模块和在索引预处理模块中采用提取特定数量的特征词来替代文档的方法能够有效提高Lucene检索系统的效率和精度,增强Lucene检索系统中文的性能。
To improve the efficiency and accuracy of retrieval system based on Lucene in searching Chinese information, we add the Chinese word segmentation module and indexing documents pretreatment module into the system by analyzing the structure of Lucene. The specific way and process of experiment are given in the paper. Both the analysis of improvement principle in theoretic and the experimental results prove that, by substituting documents with specific quantity of characteristic words picked up in index pretreatment module, this method can effectively improve the efficiency and precision of Lucene retrieval system and enhance the proficiency of Lucene in searching Chinese words.
出处
《计算机应用与软件》
CSCD
2009年第6期175-177,共3页
Computer Applications and Software
关键词
LUCENE
索引
中文分词
文档预处理
Lucene Index Chinese word segmentation Documents pretreatment