摘要
设计实现了一个基于Lucene的中文分词模块,提出了一种更为有效的中文词处理方法,提高全文检索系统的中文处理能力。整个模块基于当前使用较为广泛的全文检索引擎工具包Lucene,结合正向最大匹配算法以及经过优化的中文分词词典,以实现更高效率和准确度的中文分词功能。在系统评测方面,通过实验仿真比较了该模块与StandardAnalyzer和CJKAnalyzer在功能和效率上的差异。对于如何构建一个高效的中文检索系统,提出了一种实现方案。
This paper designs and implements a Chinese words segmentation module,which mainly deals with Chinese words to improve the ability of full text search system.The whole module based on the most widely used architecture Lucene,and combines the maximum matching algorithm and optimized Chinese words dictionary to achieve a high efficiency and accuracy of Chinese words segmentation.The paper also compares the module with StandardAnalyze and CJKAnalyzer in function and efficiency by the experiment,and brings forward a implementation about how to construct a high efficiency Chinese searching system.
出处
《信息技术》
2010年第12期50-54,共5页
Information Technology