期刊文献+

一种面向农业信息主题网络爬虫的设计 被引量:6

Design of an Agricultural Information Focused Web Crawler
下载PDF
导出
摘要 针对用户在进行农业信息主题或相关领域的网络查询时,通用搜索引擎返回的信息过多且主题相关性不强等不足,提出了一种面向农业信息的主题爬虫的设计方案,详细讨论了该主题爬虫的爬行策略、结构设计、原理及实现。初步试验结果表明,基于该设计方案的主题爬虫在抓取农业信息主题网页时的准确率、全面率及成功率明显优于普通爬虫。 An agricultural information focused web crawler was designed to improve that when people searched agricultural information, general search engine often returued too much but non-relevance information. Its crawling strategy, structure design, working principle and implementation were discussed in details. The results of preliminary experiment showed that the focused crawler based on this design obviously more accurately and efficiently than ordinary one when crawling agricultural pages.
出处 《安徽农业科学》 CAS 北大核心 2009年第20期9699-9700,9824,共3页 Journal of Anhui Agricultural Sciences
关键词 主题爬虫 搜索引擎 农业信息 主题相关度 Focused crawler Search engine Agricultural information Degree of theme correlation
  • 相关文献

参考文献4

二级参考文献43

  • 1[1]Mark A.C.Overmeer.My personal search engine.Computer Networks,1999,31:2271~2279
  • 2[2]S.Lawrence,C.Lee Giles.Accessibility of information on the Web.Nature,1999,400
  • 3[3]M.Koster.Robots in the web:threat or treat.Conne Xions,1995,9(4) http://info.webcrawler.com/mak/projects/robots/threat-or-treat.html
  • 4[4]Krishan Bharat,Andrei Broder,Monika Henzinger,etc..The connectivity derver:fast access to linkage information on the web.Proc.7th International World Wide Web Conference,1998
  • 5[5]Soumen Chakrabarti.Mining the Web's link structure.Computer,IEEE,1999,August:60~67
  • 6[6]Altigran S.Da Silva,Eveline A.Veloso,Paulo B.Golgher,etc..CoBWeb--A crawler for the Brazilian Web.String Processing and Information Retrieval Symposium,1999:184~191
  • 7[7]C.M.Bowman,P.B.Danzig,D.R.Hardy,U.Manber,and M.F.Schwartz.Harvest:a scalable,customizable discovery and access system.Technical Report CU-CS-732-94,1994
  • 8[8]H.Yamana,K.Tamur,H.Kawano,S.Kamei,M.Harada,etc.Experiments of collecting www information using distributed www robots.In Proceedings of the 21st International ACM SIGIR Conference,Australian,1998
  • 9[9]Y.S.Maarek,et al.WebCutter:a system for dynamic and tailorable site mapping.Proc.of 6th WWW Conference,Santa Clara,USA,April,1997
  • 10[10]Gun-Woo Nam,Jong-Hee Park,Tai-Yun Kim.Dynamic management of URL based on object-oriented paradigm.Parallel and Distributed Systems,IEEE,1998:226~230

共引文献181

同被引文献42

  • 1彭轲,廖闻剑.基于浏览器服务的网络爬虫[J].硅谷,2009,2(4). 被引量:7
  • 2PENG Tao HE Fengling ZUO Wanli.A New Framework for Focused Web Crawling[J].Wuhan University Journal of Natural Sciences,2006,11(5):1394-1397. 被引量:3
  • 3胡昌平,晏浩.知识管理活动创新性研究之协同知识管理[J].中国图书馆学报,2007,33(3):95-97. 被引量:54
  • 4Alexandros Batzios, Christos Dimou, Andreas L Symeonidis, et al. BioCrawler: An intelligent crawler for the semantic Web [ J ]. Expert Systems with Applications, 2008,35 (1-2) :524-530.
  • 5Sotiris Batsakis,Euripides G M Petrakis,Evangelos Milios. Improving the performance of focused Web crawlers [ J ]. Data & Knowledge Engineering,2009,68(10) :1001-1013.
  • 6LI ST,TSAI FC.Concerpt-guided query expansion for knowledge management with semi-automatic knowledge captur-ing[J].Journal of Computer Information Systems,2009(1):53-65.
  • 7Cho J, Garcia-molina H, Page L Efficient Crawling Through URL Ordering[J]. Computer Networks, 1998,30(1-7) :161-172.
  • 8Cbakrabarti S,Dom B E,Gibson D,et al. Miming the Web ' s Link Structure Computer [J]. IEEE, 1999, 32 (8):60-67.
  • 9Heritrix-homepage[-EB/OL]. http :// rawler, archive. org/,2007-06-10.
  • 10赵慧娟,卞艺杰,杨际青.基于知识链的组织知识管理绩效评价[J].情报杂志,2008,27(2):25-27. 被引量:7

引证文献6

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部