期刊文献+

分布式主题爬虫的设计与实现 被引量:3

RESEARCH AND IMPLEMENTATION OF A DISTRIBUTED TOPIC CRAWLER
下载PDF
导出
摘要 研究实现了一个分布式网络爬虫系统。系统架构主要分为控制节点和爬行节点两部分,并描述了分布式系统关键技术的解决方案。系统采用二级哈希映射算法进行任务分配以解决基于目标导向、负载均衡的URL分配问题,使用消息通信使节点相互协作,提出利用遗传算法作为该主题爬虫系统的搜索策略,并给出了网页更新策略的改进方法。 The thesis studies and realises a distributed network crawler system. Its system architecture consists of two major parts: the control node and the crawl node. The key technology solution to distributed system is described,too. The system applies the second level Hash algorithm to task assignment to solve the URL allocation issue based on target orientation and load balance. Nodes are cooperative to each other by means of messaging. The thesis suggests the genetic algorithm to be the search strategy for the topic crawler,and an improved method for webpage update strategy.
出处 《计算机应用与软件》 CSCD 2010年第12期135-138,共4页 Computer Applications and Software
关键词 主题爬虫 分布式 遗传算法 搜索引擎 Web crawler Distributed Genetic algorithm Search engine
  • 相关文献

参考文献6

二级参考文献38

  • 1李卫,刘建毅,何华灿,王枞.基于主题的智能Web信息采集系统的研究与实现[J].计算机应用研究,2006,23(2):163-166. 被引量:15
  • 2钱榕,徐新华,郑莹,杨炳儒.智能专题化信息搜集Crawler[J].计算机工程,2006,32(3):57-59. 被引量:4
  • 3Menczer F,Srinivasan G P P,Ruiz M.Evaluating Topic-driven Web Crawlers[C].Proceedings of the 24th Annual International ACM/SIGIR Conference,2001.
  • 4Grama A,Karypis G,Kumar V,et al.Introduction to Parallel Computing (Second Edition)[M].Boston:Addison-Wesley,2003.
  • 5Brin S, Page L. The Anatomy of a Large Scale Hyper Textual Web Search Engine [C]. Proceeding of the WWW7 Conference, Elsevier,Australia, 1998: 107-117.
  • 6Rungsawang A, Angkawattanawit N. Learnable Topic-specific Web Crawler[J]. Journal of Network and Computer Applications, 2005, 28(2): 97-114.
  • 7Chakrabhik S, Vandenburg M, Dom B. Focused Crawling: A New Approach to Topic-specific Web Resource Discovery[C]//Proceedings of the 8th International World-Wide Web Conference. Toronto, Canada: [s. n.], 1999.
  • 8Liu Hongyu, MIuOS E, Janssen J. Probabilistic Models for Focused Web Crawling[C]//Proceedings of the 6th Annual ACM International Workshop on Web Information and Data Management. New York, USA: ACM Press, 2004.
  • 9Florescu D, Levy A, Mendelzon A. Database Techniques for the World-Wide Web: A Survey[J]. SIGMOD Record, 1998, 27(3): 59-74.
  • 10Wei Jiying, Wen Jirong. instance-based Schema Matching for Web Databases by Domain-specific Query Probing[C]//Proceedings of the 30th international Conference on VLDB. Toronto, Canada: [s. n.], 2004.

共引文献149

同被引文献21

  • 1Ioannis Avraam, Ioannis Anagnostopoulos. A Comparison over Focused Web Crawling Strategies[ C]//Informatics (PCI) 2011 15th Confer- ence on Panhellenic ,2011:245 - 249.
  • 2Hersovici M, Jacovi M, Maarek, et al. The Shark-Search Algorithm-an Application:Tailored Web Site Mapping[ C ]//Proceedings of the 7th International World Wide Web Conference,1988:317-326.
  • 3杨仁广,宋宇,孟祥增.一种改进Shark-Search的多媒体主题搜索算法[J].计算机应用与工程,2010,46(14):152-154.
  • 4Brin S, Page L. Anatomy of a Large-Scale Hypertextual Web Search En- gine[ C ]//Prec. 7th International World Wide Web Conference,1998.
  • 5Judy Johnson, Kostas Tsioutsiouliklis, Clee Giles. Evolving strategies for focused web crawling[ C ]//International Conference on Machine Learning, 2003.
  • 6Dirk Ahlers, Susanne Boll. Urban web crawling [ C ]//Proceeding of the 17th international conference on World Wide Web, April 21 -25, 2008:25 - 32.
  • 7Tax D, Duin R. Data domain description by support Vectors[ C ]//Pro- ceeding of European Symposium on Artificial Neural Networks. Bel- gium, 1999:251 - 256.
  • 8Animesh Tripethly, Prashanta KPatra. A web mining architectural mod- el of distributed crawler for lnternet searches using PageRank algorithm [ C ]. Asia-Pacific service Computing Conference ( IEEE Xplore). AP- SCC ,2008:513 -518.
  • 9温泉,丁祥武.基于主题聚焦模型的PageRank改进算法[J].计算机应用与软件,2011,28(3):173-175. 被引量:2
  • 10郭涛,黄铭钧.社区网络爬虫的设计与实现[J].智能计算机与应用,2012,2(4):65-67. 被引量:10

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部