摘要
针对传统搜索引擎检索返回结果数量庞大、专业性差、查准率低等问题,在分析研究Nutch开源搜索引擎工作原理的基础上,采用基于词库的正向最大匹配分词算法实现中文分词、基于关键词的向量空间模型实现主题相关性判别、基于PageRank排序算法改进结果排序等对Nutch进行二次开发,并将农业领域本体应用于搜索引擎的信息采集及过滤、信息检索以及相关词推荐等各个阶段,设计并实现了基于Nutch的农业垂直搜索引擎。实验结果表明,基于Nutch的农业垂直搜索引擎可以提高用户检索的查准率,满足用户检索的专业性需求。
In view of the traditional search engines returning a large number of results,poor profession,low precision rate and other issues,on the basis of analyzing the working principle of Nutch open source engine,this thesis usespositive maximum matching segmentation algorithm based on lexicon to achieve Chinese work segmentation,vector space model based on keywords is used to implement topics related discrimination,the result rank bases on PageRank ranking algorithm is improved to make a secondary development of Nutch and agriculture domain ontology is applied to information collection and filtering,information retrieval,and recommend related word various stages,at last,the agricultural vertical search engine based on Nutch is designed and implemented.Experiments show that the agricultural vertical search engine based on Nutch can improve user retrieval precision and meet the professional demand of user retrieval.
出处
《计算机工程与设计》
CSCD
北大核心
2014年第6期2239-2243,共5页
Computer Engineering and Design
基金
"十二五"国家科技支撑计划基金项目(2011BAD21B05
2012BAH30F00
2012BAH30F01)