期刊文献+

基于ProActive的P-Spider1.0改进

Improvement of ProActive-based P-Spider1.0
下载PDF
导出
摘要 针对带中心节点结构的分布式并行Web Spider的中心节点负担过重、通信负载不均衡、可扩展性差的问题,提出基于Rabin指纹算法的URL去重改进算法和节点对等结构的改进方案,利用ProActive中间件设计开发改进的分布式并行Web Spider。对比实验表明,改进后的Web Spider采集效率更高,通信负载均衡,无节点瓶颈问题,具有良好的可扩展性。 The distributed parallel Web Spider with center node is inadequate in expandability, and there is excessive burden on center node. In the same way, the communication load is not balanced. In order to overcome these problems, this paper presents an improved URL removing algorithm based on Rabin fingerprint algorithm. The improved scheme of Peer-to-Peer structure is proposed. The improved distributed parallel Web Spider is developed with ProActive middleware. Contrast experiments show that the improved Web Spider has higher collection efficiency, balanced communication load, without node bottleneck, and better expandability.
出处 《计算机工程》 CAS CSCD 北大核心 2010年第17期288-290,共3页 Computer Engineering
基金 广西教育厅科研基金资助项目(桂教科研[2006]26号) 广西大学博士启动基金资助项目
关键词 网络蜘蛛 ProActive中间件 节点对等 分布式 中心节点 Web Spider ProActive middleware Peer-to-Peer(P2P) distributed center node
  • 相关文献

参考文献6

二级参考文献11

  • 1Cho J.Parallel Crawlers[Z].(2002-05-11).http://www2002.org/ CDROM/refereed/108/.
  • 2Karger D,Lehman E,Leighton T,et al.Consistent Hashing and Random Trees:Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web[C]//Proc.of STOC'97.New York,NY,USA:ACM Press,1997.
  • 3Chakrabarti S,Berg M,Dom B.Focused Crawling:a New Approach to Topic-specific Web Resource Discovery[J].Computer Networks,1999,31(11):1623-1640.
  • 4Davison B D.Topical Locality in the Web[C]//Proceedings of the 23rd Annual International Conference on Research and Development in Information Retrieval.New York,NY,USA:ACM Press,2000.
  • 5ProActive[Z].http://www-sop.inria.fr/oasis/proactive.
  • 6Huet F,Caromel D,Bal H E.A High Performance Java Middleware with a Real Application[M].IEEE Computer Society Year of Publication,2004.
  • 7Baduel L,Baude F O,Caromel D.Efficient,Flexible,and Typed Group Communications in Java[C].Proc.of ACM JGI'02,Seattle,Washington,USA,2002-11-03.
  • 8Baude F,Caromel D,Huet F,et al.Interactive and Descriptorbased Deployment of Object-oriented Grid Applications[C].Proceedings of the 11^th IEEE International Symposium on High Performance Distributed Computing,2002:93-102.
  • 9Allan Heydon,Marc Najork. Mercator: A scalable, extensible Web crawler[J] 1999,World Wide Web(4):219~229
  • 10刘济波,朱培栋.WWW大规模cache技术[J].现代计算机,1998(6):8-10. 被引量:1

共引文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部