基于中央控制节点的分布式网络蜘蛛的设计与实现

Design and implementation of distributed Web spider based on central control node

下载PDF

导出

摘要针对互联网日益增长的网页数量,提出了一种采用分布式技术设计实现的分布式网络蜘蛛(DWS)。该系统作为搜索引擎的前端,快速有效地下载网页,以获得整个Internet更加完整的映像。DWS设置中央控制节点来协调各个Web Spider的行为,以宽度优先搜素获得高质量的网页,通过对域名系统(DNS)缓存来提高访问Web Server的速度,增加并行线程数量增加下载速度,并能动态地加入Web Spider节点或子中央控制节点,具有很强的灵活性和扩张能力。 Concerning the growth of Web pages everyday,a Web spider system named Distributed Web Spider（DWS） based on distributed technology was proposed.It acted as front-end of search engine and quickly and efficiently downloaded Web pages to get more complete image of the Internet.The DWS set up central control node to coordinate all of Web spider actions,used breadth-first search crawling policy to get high-quality pages,cached Domain Name System（DNS） to gather speed,increased thread number to increase download speed,added Web Spider nodes or sub-central-control-nodes dynamically,and had strong flexibility and expansion capability.

作者王颖钟勇朱颢东

机构地区中国科学院成都计算机应用研究所中国科学院研究生院

出处《计算机应用》 CSCD 北大核心 2010年第12期316-318,共3页 journal of Computer Applications

基金四川省科技计划项目(2008GZ0003) 四川省科技攻关项目(07GG006-019)

关键词分布式网络蜘蛛网页质量搜索引擎分布式计算 Distributed Web Spider（DWS） page quality search engine distributed computing

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献6

1BHARAT K, BRODER A, HENZINGER M, et al. The connectivity server: Fast access to linkage information on the Web [ J]. Computer Networks and ISDN Systems, 1998, 30( 1 -7): 469-477.
2LYMAN P, VARIAN H R. How much information school of infor mation management and systems [ EB/OL]. [ 2010 - 01 - 10]. http://www2, sims. berkeley, edu/research/projects/how-much-info/.
3DONG X Y, SU L T. Search engines on the World Wide Web and information retrieval from the lnternet: A review and evaluation [ J]. Online and CDROM review, 1997, 21(2) : 67 - 81.
4CHO J, GARC1A-MOLINA H, PAGE L. Efficient crawling through URL ordering [ J]. Computer Networks and ISDN Systems, 1998, 30(1-7): 161-172.
5BRIN S, PAGE L. The anatomy of a large-scale hypertextual Web search engine [ J]. Computer Networks and ISDN Systems, 1998, 30(1-7): 107-117.
6BOLDI P, CODENOTTI B, SANTINI M, et al. Ubicrawler: A scalable fully distributed Web crawler [ J]. Software-Practice and Experience, 2004, 34(8): 711-726.

1何绍荣,鲜乾坤.新型分布式Web Spider的设计[J].计算机工程与应用,2011,47(16):80-82.
2陈炎龙,段红玉.采用多种策略的分布式Web Spider[J].计算机与数字工程,2012,40(9):63-65.
3袁毅,徐曼.Page Rank判断网页质量的可靠性分析[J].情报杂志,2006,25(2):58-60. 被引量：3
4王晓卓,周春红,马文彬.网络蜘蛛及WEB文档解析[J].剑南文学（经典教苑）（下）,2012(3):218-218.
5华伟臣,张秀琼.网络蜘蛛搜索研究[J].乐山师范学院学报,2006,21(5):85-87. 被引量：2
6刘志鹏,陈星,庞小林,袁军.Android移动终端在智能家居中的应用设计[J].信息与电脑,2016,28(5):78-79. 被引量：1
7庄军,袁梅,刘侃,杨正国.基于HIS下的决策支持系统的开发设想[J].医学信息（西安上半月）,2005,18(3):177-181. 被引量：2
8侯大银.腾讯技术观系列专栏之九 WebQQ开启终端新“蓝海”[J].互联网周刊,2009(18):64-65.
9王仁明,李夕海,刘代志.并行线程技术在IP网络语音通话中的应用[J].计算机应用与软件,2003,20(2):1-2.
10孔宪青.基于单片机运行的多线程任务状态机[J].山西电子技术,2016(6):64-65. 被引量：1

计算机应用

2010年第12期

浏览历史

内容加载中请稍等...

基于中央控制节点的分布式网络蜘蛛的设计与实现

参考文献6

相关作者

相关机构

相关主题

浏览历史