期刊文献+

一种新的网络爬虫带宽控制策略 被引量:2

A Fast Iris Localization and Recognition Method
下载PDF
导出
摘要 网络爬虫如何在限定带宽的条件下进行爬行是一个有巨大应用价值的问题,但是目前对这个方面的研究较少,本文提出了一种基于对站点礼貌爬行的爬虫带宽控制策略,通过对不同站点下载速度的建模分析和基于礼貌爬行的访问频率控制,得到了面向站点的爬行控制算法,最后实验证明这种方法能够充分利用所限定的带宽。 How to run under constrained bandwidth for web crawlers is of great applicant value, however, it has been seldom studied. This paper present a crawler bandwidth controlling policy based on polite crawling. The model of predict downloading speed of differ- ent sites is set up, and the maximum request frequency of sites are obtained based on polite crawling. Upon these, a site-based controlling algorithm of crawling is presented. The experimental results prove effectiveness of it.
出处 《微计算机信息》 北大核心 2008年第33期76-77,106,共3页 Control & Automation
基金 国家自然科学基金项目"基于增量学习的主题爬虫关键技术研究"(No.60603066)
关键词 网络爬虫 限定带宽 礼貌爬行 Web crawler bounded bandwidth polite crawling
  • 相关文献

参考文献6

  • 1M. Koster. Robots exclusion protocol [EB/OL]. http://www.robotstxt.org/wc/norobots.html, 1994
  • 2M. Mauldin and M. Schwartz. Spidering BOF report [R]. Technical report, Distributed Indexing/Searching Workshop, May 1996.
  • 3M. Najork and A. Heydon. High-performance Web Crawling[R]. Technical report, Compaq Systems Research Center, September 2001.
  • 4Kasom Koht-arsa and Surasak Sanguanpong. High Performance Large Scale Web Spider[C]. The 2002 International Symposium on Communications and Information Technology. Pattaya, Chonburi, Thailand, 2002.
  • 5Michelangelo Diligenti, Marco Maggini, Filippo Maria Pucci. Design of a Crawler with Bounded Bandwidth[C]. In Proceedings of the 13th international World Wide Web Conference, 2004.
  • 6李涛,陈鹏,李哲.深度Web资源探测系统的研究与实现[J].微计算机信息,2007,23(33):185-187. 被引量:7

二级参考文献8

  • 1杨海东,叶小岭,张颖超.基于Hash算法实现搜索引擎中重复WEB页面的消除[J].微计算机信息,2006,22(09X):299-301. 被引量:6
  • 2Yanbo Ru,Department of Computer Science,University of Southem California,Los Angeles,California USA .Ellis Horowitz,Depart-ment of Computer Science,University of Southern California, LosAngeles,Califomia,USA. Indexing the invisible web:a survey.
  • 3Henry Kautz,Bart Selman,Mehul Shah. The Hidden Web.
  • 4Luciano Barbosa,University of Utah, Juliana Freire,University of Utah. Searching for Hidden-Web DataBases.
  • 5Andrei Z.Broder, IBM TJ Watson Research Center.Marc Najork, Microsoft Research, Janet L.Wiener,Hewlett Packard Labs .Efficient URL Caching for World Wide Web Crawling.
  • 6Ricardo Baeza-Yates,Berthier Ribeiro-Neto etc. Modern Information Retrieval.
  • 7潘春华,冯太明.武港山基于移动爬虫的web信息收集系统的设计.
  • 8Deep web 白皮书.http://www.brightplanet.com/resources/details/deepweb.html.

共引文献6

同被引文献7

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部