期刊文献+

基于Linux的网络爬虫系统 被引量:8

Web Crawler System Based on Linux
下载PDF
导出
摘要 针对目前影响爬虫程序效率的诸多关键因素,在研究爬虫程序内部运行机理的基础上,进行架构优化,改进爬虫程序中的相关算法。在Linux网络环境下,通过对实现的爬虫程序运行进行检测,反馈出该解决方案和改进之处具有可行性,提高了页面抓取的效率和爬虫程序的整体性能。 In view of current key aspects that affect the crawler system efficiency, through research of crawler system interior movement mechanism, this paper optimizes the overhead construction and improves its algorithm. In the Linux network environment, through movement examination of the crawler system, it may feed back several kinds of solutions and improvement place which are feasible, and it also enthanees the efficiency and the crawler system overall performance.
出处 《计算机工程》 CAS CSCD 北大核心 2010年第1期280-282,共3页 Computer Engineering
关键词 网络爬虫 URL调度 DNS解析 哈希算法 Web crawler URL dispatch DNS resolution Hash algorithm
  • 相关文献

参考文献4

  • 1Aggarwal C, AI-Garawi F, Yu E Intelligent Crawling on the World Wide Web with Arbitrary Predicates[C]//Proc. of the 10th International World Wide Web Conference. Hong Kong, China: ACM Press, 2001 : 96-105.
  • 2李学勇,许向阳,邱建雄,欧阳柳波,李国徽.基于Boltzmann行动选择策略的网络蜘蛛搜索算法[J].小型微型计算机系统,2005,26(6):932-935. 被引量:4
  • 3Hafri Y, Djeraba C. High Performance Crawling System[C]//Proc. of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval. New York, USA: ACM Press, 2004: 299-306.
  • 4Stevens W R,Fenner B,Rndoff A M.Unix网络编程第1卷:套接口API[M].3版.杨继张,译.北京:清华大学出版社,2006.

二级参考文献14

  • 1Murray B H, Moore A. Sizing the internet[Z]. A White Paper:Cyveillance, Inc. 2000.
  • 2Lawrence S, Giles L. Accessibility and distribution of information on the Web[J]. Nature . 1999, 400(8):107-109.
  • 3Brewington B E, Cybenko G. How dynamic is the Web[C]? In:Proc of the 9th International World Wide Web Conference.2000.
  • 4Ester M, Grob M, Kriegel H. Focused Web crawling: a generic framwork for specifying the user interest and for adaptive crawling stratrgies[C]. In: Proc of the International Conference on Very Large Database (VLDB′ 01 ), 2001.
  • 5Bra D P, Houben G, Kornatzky et al. Information retrieval in distributed hypertexts[C]. In: Proc of the 4th RIAO Conference, 1994,481-491.
  • 6Hersovici M, Heydon A, Mitzenmacher M, Najork Y S, Pelleg D, Shtalhan M, Ur S. The shark-search algorithm-An application: Tailored Web site mapping[C]. In: Proc of the 7th International World-Wide Web Conference, 1998.
  • 7Aggarwal C, AI-Garawi F, Yu S P. Intelligent crawling on the world wide Web with arbitrary predicates[C]. In: Proc of the 10th International World Wide Web Conference,2001.
  • 8Cho J, Garcia-Molina H, Page L. Efficient crawling through URL ordering [J]. Computer Networks. 1998 30 (1-7): 161-172.
  • 9Chakrabarti S, van den Berg M, Dom B. Focused crawling: a new approach to topic-specific Web resource discovery [ J].Computer Networks. 1999, 31 (11-16) :1623-1640.
  • 10Pant G, Srinivasan P, Menczer F. Exploration versus exploitation in topic driven crawler[C]. In: Proc of The WWW-02 Workshop on Web Dynamics. 2002.

共引文献3

同被引文献56

引证文献8

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部