摘要
本文设计实现了一个中文分词模块,其主要研究目的在于寻找更为有效的中文词汇处理方法,提高全文检索系统的中文处理能力.整个模块基于当前最流行的搜索引擎架构Lucene,实现了带有歧义消除功能的正向最大匹配算法.在系统评测方面,比较了该方法与现有方法的区别,对于如何构建一个高效的中文检索系统,提出了一种实现.
This paper design and implement a Chinese words segmentation module, which mainly for dealing with Chinese words to improve the ability of full text search system. The whole module based on the most popular architecture Lucene, and implement the Maximum Matching Algorithm with the ability of eliminate different meanings. The authors also compare our method with methods in existence, and bring forward a implementation about how to construct a high efficiency Chinese searching system.
出处
《四川大学学报(自然科学版)》
CAS
CSCD
北大核心
2008年第5期1095-1099,共5页
Journal of Sichuan University(Natural Science Edition)
基金
四川省重点科技项目(05GG021-003-2)
关键词
中文分词
搜索引擎
LUCENE
正向最大匹配算法
Chinese word segmentation, search engine, Lucene, forwards maximum match algorithm