摘要
随着农业信息化、智能化的不断发展,农业信息量呈现井喷式增长,为广大农业从业者和农业科研人员提供便捷有效的信息检索方法是目前农业搜索引擎亟需解决的问题。为此,本文提出了基于Heritrix+Solr的农业信息垂直搜索引擎框架,并设计了适用于农业信息垂直搜索引擎的隐马尔科夫Web信息抽取模块和基于词典的mmseg4j中文分词模块,同时改进了页面排序算法,对进一步提升农业垂直搜索引擎的用户体验和工作效率具有一定的参考价值。
The agricultural information blooms rapidly with the development of agriculture in information and intelligence, therefore, a convenient and effective agricultural information search method and search engine for agricultural researchers, producers and managers is in need. A search engine framework based on Heritrix and Solr was put forward, in which Hidden Markvo Model based web information extraction and mmseg4 j agricultural dictionary based Chinese word segmentation were involved, moreover, the page ranking algorithm was improved according to the characteristics of agricultural information search. Finally, this paper provided suggestions for improving the user experience and efficiency of agricultural vertical search engine.
出处
《广东农业科学》
CAS
2015年第5期139-144,共6页
Guangdong Agricultural Sciences
基金
山东省自主创新专项(2012CX90204)