摘要
针对带中心节点结构的分布式并行Web Spider的中心节点负担过重、通信负载不均衡、可扩展性差的问题,提出基于Rabin指纹算法的URL去重改进算法和节点对等结构的改进方案,利用ProActive中间件设计开发改进的分布式并行Web Spider。对比实验表明,改进后的Web Spider采集效率更高,通信负载均衡,无节点瓶颈问题,具有良好的可扩展性。
The distributed parallel Web Spider with center node is inadequate in expandability, and there is excessive burden on center node. In the same way, the communication load is not balanced. In order to overcome these problems, this paper presents an improved URL removing algorithm based on Rabin fingerprint algorithm. The improved scheme of Peer-to-Peer structure is proposed. The improved distributed parallel Web Spider is developed with ProActive middleware. Contrast experiments show that the improved Web Spider has higher collection efficiency, balanced communication load, without node bottleneck, and better expandability.
出处
《计算机工程》
CAS
CSCD
北大核心
2010年第17期288-290,共3页
Computer Engineering
基金
广西教育厅科研基金资助项目(桂教科研[2006]26号)
广西大学博士启动基金资助项目