摘要
针对用户在进行农业信息主题或相关领域的网络查询时,通用搜索引擎返回的信息过多且主题相关性不强等不足,提出了一种面向农业信息的主题爬虫的设计方案,详细讨论了该主题爬虫的爬行策略、结构设计、原理及实现。初步试验结果表明,基于该设计方案的主题爬虫在抓取农业信息主题网页时的准确率、全面率及成功率明显优于普通爬虫。
An agricultural information focused web crawler was designed to improve that when people searched agricultural information, general search engine often returued too much but non-relevance information. Its crawling strategy, structure design, working principle and implementation were discussed in details. The results of preliminary experiment showed that the focused crawler based on this design obviously more accurately and efficiently than ordinary one when crawling agricultural pages.
出处
《安徽农业科学》
CAS
北大核心
2009年第20期9699-9700,9824,共3页
Journal of Anhui Agricultural Sciences
关键词
主题爬虫
搜索引擎
农业信息
主题相关度
Focused crawler
Search engine
Agricultural information
Degree of theme correlation