摘要
基于开源框架Lucene搭建了一个针对文本的全文检索系统,并对其进行扩展,支持多种类文本的全文检索.针对Lucene自带中文分词器的不足,提出了一种改进的基于字典的中文分词方法并将其应用到检索系统中去.该系统具有较高的文本检索准确率和召回率,具有一定的应用和推广价值.
Full-text retrieval system can effectively improve the accurate and recall rate for text retrieval. This paper firstly put forward a full-text retrieval system based on the open source framework Lucene and ex- tends it to support the different types of texts.Secondly,a Chinese word segmentation algorithm based on the dictionary match is suggested to improve the accurate and recall rate of the Lucene.The result of the experi- ment shows that this full-text retrieval system has higher accurate and recall rate and brighter application fu- ture.
出处
《浙江外国语学院学报》
2013年第4期77-81,共5页
Journal of Zhejiang International Studies University
基金
浙江省教育厅科研计划项目(Y201018459)
关键词
全文检索
搜索引擎
中文分词
full-text retrieval
search engine
Chinese word segmentation