期刊文献+

基于中央控制节点的分布式网络蜘蛛的设计与实现

Design and implementation of distributed Web spider based on central control node
下载PDF
导出
摘要 针对互联网日益增长的网页数量,提出了一种采用分布式技术设计实现的分布式网络蜘蛛(DWS)。该系统作为搜索引擎的前端,快速有效地下载网页,以获得整个Internet更加完整的映像。DWS设置中央控制节点来协调各个Web Spider的行为,以宽度优先搜素获得高质量的网页,通过对域名系统(DNS)缓存来提高访问Web Server的速度,增加并行线程数量增加下载速度,并能动态地加入Web Spider节点或子中央控制节点,具有很强的灵活性和扩张能力。 Concerning the growth of Web pages everyday,a Web spider system named Distributed Web Spider(DWS) based on distributed technology was proposed.It acted as front-end of search engine and quickly and efficiently downloaded Web pages to get more complete image of the Internet.The DWS set up central control node to coordinate all of Web spider actions,used breadth-first search crawling policy to get high-quality pages,cached Domain Name System(DNS) to gather speed,increased thread number to increase download speed,added Web Spider nodes or sub-central-control-nodes dynamically,and had strong flexibility and expansion capability.
出处 《计算机应用》 CSCD 北大核心 2010年第12期316-318,共3页 journal of Computer Applications
基金 四川省科技计划项目(2008GZ0003) 四川省科技攻关项目(07GG006-019)
关键词 分布式网络蜘蛛 网页质量 搜索引擎 分布式计算 Distributed Web Spider(DWS) page quality search engine distributed computing
  • 相关文献

参考文献6

  • 1BHARAT K, BRODER A, HENZINGER M, et al. The connectivity server: Fast access to linkage information on the Web [ J]. Computer Networks and ISDN Systems, 1998, 30( 1 -7): 469-477.
  • 2LYMAN P, VARIAN H R. How much information school of infor mation management and systems [ EB/OL]. [ 2010 - 01 - 10]. http://www2, sims. berkeley, edu/research/projects/how-much-info/.
  • 3DONG X Y, SU L T. Search engines on the World Wide Web and information retrieval from the lnternet: A review and evaluation [ J]. Online and CDROM review, 1997, 21(2) : 67 - 81.
  • 4CHO J, GARC1A-MOLINA H, PAGE L. Efficient crawling through URL ordering [ J]. Computer Networks and ISDN Systems, 1998, 30(1-7): 161-172.
  • 5BRIN S, PAGE L. The anatomy of a large-scale hypertextual Web search engine [ J]. Computer Networks and ISDN Systems, 1998, 30(1-7): 107-117.
  • 6BOLDI P, CODENOTTI B, SANTINI M, et al. Ubicrawler: A scalable fully distributed Web crawler [ J]. Software-Practice and Experience, 2004, 34(8): 711-726.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部