摘要
分析Lucene的语言分析器结构,针对其只能进行中文单字、双字切分的不足,采用基于词典的正向最大匹配分词算法,设计并实现基于Lucene的中英文语言分析器ZH_CNAnalyzer,实验结果表明其能够对中英文文档进行高效索引,满足实际应用的需要。
This paper introduces the structure of analysis in Lucene, designs and implements the chinese and english language ZH_ CNAnalyzer which uses forwards maximum match algorithm for the disadvantage of one - word and two - words segmentation. It can meet the needs of practical application that can index the documents consist of chinese and english words efficiently.
出处
《图书情报工作》
CSSCI
北大核心
2009年第15期118-121,共4页
Library and Information Service