期刊文献+

大豆主题网页资源采集系统的研究 被引量:1

The Research of Soybean Subject Webpage Resources Acquisition System
下载PDF
导出
摘要 农业专业搜索引擎对特定主题的农业信息进行检索,其信息量多、精确度低。针对此现状,以开源搜索引擎Nutch为技术框架,对大豆主题网页资源采集系统进行了研究与设计。以大豆信息为主题,研究了主题相关度判别技术,借鉴BM25F模型的分域思想、基于向量空间模型,提出了大豆主题相关度判别算法。在Nutch中引入IKAnalyzer中文分词工具包,实现了大豆主题相关度的判别。实验结果表明,该算法能够显著地提高大豆主题网页资源采集的准确率。 Presently the amount of information is large and the accuracy is low when agricultural professional search en gine is retrieving the specific subject agricultural information. In view of the situation, this article makes research and de sign on the soybean subject webpage resources acquisition system based on the technological framework of Nutch, which is open-source search engine. Taking soybean information as the subject, it researches the technology of subject correla tion judgment, references to the thought of BM25F model and puts forward the algorithm discriminating soybean subject correlation based on vector space model. It introduces the Chinese word segmentation' s toolkit of IKAnalyzer Based on the Nutch, and finally realizes the judgment of soybean subject correlation. The experimental results show that the algo rithm can significantly improve the accuracy of acquainting soybean subject webpage resources.
机构地区 东北农业大学
出处 《农机化研究》 北大核心 2014年第3期182-185,共4页 Journal of Agricultural Mechanization Research
基金 国家自然科学基金项目(31101080)
关键词 网页抓取 大豆主题 主题相关度 农业 搜索引擎 information gathering soybean subject subject relevance agriculture search engine
  • 相关文献

参考文献6

二级参考文献29

共引文献21

同被引文献70

引证文献1

二级引证文献207

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部