期刊文献+

可定制的聚焦网络爬虫 被引量:4

A Customized Focusing Crawler
下载PDF
导出
摘要 网络资源信息的爆炸式增长、用户越来越个性化的需求,使得针对特定主题的搜索引擎越来越受到青睐。聚焦网络爬虫是主题搜索引擎的重要组成部分,它从Web上下载针对某一主题的文档。可定制的聚焦网络爬虫是具有主题的可选择性、可定制性的主题爬虫。文中介绍了一套更加有效的爬虫算法,它具有高效(优先下载主题相关度高的资源)、资源占用少(减少URL队列长度)、主题易移植(主题的可定制性)等特点。 As the intemet is developing explosively and the requirement of the users is becoming individualized, the search engine on a specialized topic is warmly received. A focusing crawler is an important part of the search engine which only fetches the web pages on a specific topic. The customized focusing web crawler is a topic crawler with optional and customizable topic. In this paper a crawler with which possesses the advantages of high downloaded), occupying few resources ( customizability of topics). productivity (resources with better quality new algorithms is designed, have higher priorities to be ( reducing the length of the URL queue), easy transplantation of topics
作者 邹海亮 孙莉
出处 《电子科技》 2009年第1期47-50,共4页 Electronic Science and Technology
关键词 信息收集 搜索引擎 网络爬虫 information collection search engine web crawler
  • 相关文献

参考文献1

二级参考文献11

  • 1Suel T,,Mathur C,Wu J W,et al.ODISSEA:apeer-to-peer architecture for scalable web search andinformation retrieval. TR-CIS-2003-01 . 2003
  • 2Michel S,,Triantafillou P,Weikum G.MINERVA∞:a scalable efficient peer-to-peer search engine. Proceedings of Middleware 2005 . 2005
  • 3Zhou J,Li K,Tang L.Towards a fully distributedP2P Web search engine. Proceedings of the 10thIEEE International Workshop on Future Trends ofDistributed Computing Systems . 2004
  • 4Chakrabarti S,Van Den Berg M,Dom B.Focusedcrawling:a new approach to topic-specific web re-source discovery. Computer Networks . 1999
  • 5Ehrig M,Maedche A.Ontology-focused crawling ofWeb documents. Proceedings of the 2003 ACMSymposium on Applied Computing . 2003
  • 6Graupmann J,Biwer M,Zimmer C,et al.COM-PASS:a concept-based Web search engine for HT-ML,XML,and deep Web data. Proceedings ofthe 30th International Conference on Very Large DataBases . 2004
  • 7Rennie J,McCallum A.Using reinforcement learningto spider the Web efficiently. Proceedings of the16th International Conference on Machine Learning . 1999
  • 8Almpanidis G,Kotropoulos C,Pitas I.Focusedcrawling using latent semantic indexing—an applica-tion for vertical search engines. Proceedings ofthe 9th European Conference on Digital Libraries . 2005
  • 9Aggarwal C C,Al-Garawi F,Yu P S.Intelligentcrawling on the world wide Web with arbitrary predi-cates. Proceedings of the 10th International Con-ference on World Wide Web . 2001
  • 10Diligenti M,Coetzee F M,Lawrence S,et al.Fo-cused crawling using context graphs. Proceed-ings of the 26th International Conference on VeryLarge Data Bases . 2000

共引文献4

同被引文献35

  • 1彭轲,廖闻剑.基于浏览器服务的网络爬虫[J].硅谷,2009,2(4). 被引量:7
  • 2李卫,刘建毅,何华灿,王枞.基于主题的智能Web信息采集系统的研究与实现[J].计算机应用研究,2006,23(2):163-166. 被引量:15
  • 3PENG Tao HE Fengling ZUO Wanli.A New Framework for Focused Web Crawling[J].Wuhan University Journal of Natural Sciences,2006,11(5):1394-1397. 被引量:3
  • 4Alexandros Batzios, Christos Dimou, Andreas L Symeonidis, et al. BioCrawler: An intelligent crawler for the semantic Web [ J ]. Expert Systems with Applications, 2008,35 (1-2) :524-530.
  • 5Sotiris Batsakis,Euripides G M Petrakis,Evangelos Milios. Improving the performance of focused Web crawlers [ J ]. Data & Knowledge Engineering,2009,68(10) :1001-1013.
  • 6Brin S, Page L. The anatomy of a large - scale hypertextual web search engine [ EB/OL]. http ://www - db. stanford, edu/~ backrub/google, htrrd.
  • 7G Salton, A Wong, C S Yang. A vector space model for automatic indexing[ J ]. Commun ACM, 1975,18 (11 ) :613 -620.
  • 8刘金红,陆余良.主题网络爬虫研究综述[J].计算机应用研究,2007,24(10):26-29. 被引量:131
  • 9Chakrabarti Soumen, Van Den Berg Martin, Dom Byron. Focused crawling:a new approach to topic-specific Web resource discovery [ J]. Computer Networks, 1999,31 ( 11 ) :1623-1640.
  • 10Barbosa Luciano, Freire Juliana. An adaptive crawler for locating hidden-web entry points[ C]. Proceedings of the 16th International Conference on World Wide Web ,2013:441-450.

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部