期刊文献+

一种分布式网络爬虫的设计与实现 被引量:2

Design and Implementation of a Distributed Web Crawler
下载PDF
导出
摘要 利用用户指定的关键字和搜索引擎生成URL种子,通过分布式网络爬虫抽取符合用户需求的网页作为研究所用的语料.实验结果表明:分布式网络爬虫可以较好地解决在短时间内抽取大量语料的需求. User-specified keywords to generate URL seeds by search engine has been used.Webpage for user's requirements as research corpus through distributed web crawler has been extracted.Experiments show that the distributed web crawler can be good solution to extract a large number of corpora in a short time.
出处 《江西师范大学学报(自然科学版)》 CAS 北大核心 2013年第4期382-386,共5页 Journal of Jiangxi Normal University(Natural Science Edition)
基金 国家自然科学基金(60773087)资助项目
关键词 分布式系统 网络爬虫 设计 distributed system web crawler design
  • 相关文献

参考文献11

  • 1Tripathy A, Patra P K. A web mining architectural model of distributed crawler for internet searches using page rank algorithm [ EB/OL ]. [ 2012-08-18 ]. http: ff wenku. baidu, corn/view/03181 bd084254b35 eefd3412.
  • 2周立柱,林玲.聚焦爬虫技术研究综述[J].计算机应用,2005,25(9):1965-1969. 被引量:153
  • 3Radhakishan V, aser F, Selvakumar S, CRAYSE : design and implementation of efficient text search algorithm in a web crawler [ EB/OL]. [ 2012-08-19 ]. http: //libra. msra. cn/Publication/1 d414792/crayse-design-and-imple- mentation-of-efficient-text-search-algorithm-in-a-web-craw- ler.
  • 4Shekhar S, Agrawal R, Arya K V. An architectural frame- Work of a crawler for retrieving highly relevant web docu- ments by filtering replicated web collections [ EB/OL ]. [ 2012-08-19 ]. http:///dl, acm. org/citation, cfm? id - 1844773.
  • 5Zhu Kunpeng, Xu Zhiming, Wang Xiaolong, et al. A full distributed web crawler based on structured network [ M ]. Berlin :: Springer,2008:478-483.
  • 6李晓明,李星.搜索引擎与web挖掘进展论文集[C].北京:高等教育出版社,2003:1-8.
  • 7Robert C M. Krishna B. SPHINX:a framework for creating personal, site-specific Web crawlers [ J J. Computer Net- works md ISDN Systems, 1998,39 (1/7) : 119-130.
  • 8闵秋应,况庆强.改进型BP神经网络自适应均衡器设计[J].江西师范大学学报(自然科学版),2012,36(3):276-279. 被引量:1
  • 9周模,张建宇,代亚非.可扩展的DHT网络爬虫设计和优化[J].中国科学:信息科学,2010,40(9):1211-1222. 被引量:7
  • 10王珏.重叠型P2P网络中的查询负载均衡策略研究[J].江西师范大学学报(自然科学版),2012,36(3):292-296. 被引量:1

二级参考文献74

共引文献164

同被引文献55

  • 1王宏伟.特大自然灾害的舆情监控研究[J].中国公共安全(学术版),2008(Z1):11-15. 被引量:5
  • 2孔凡敏,杨乃.移动互联网时代政府公共信息服务方式展望[J].中国地质大学学报(社会科学版),2013,13(S1):23-26. 被引量:11
  • 3MA Y P,SHU X M,SHEN S F,et al.Study on network public opinion dissemination and coping strategies in large fire disasters[J].Procedia Engineering,2014,71:616-621.
  • 4ALEXANDER D E.Social media in disaster risk reduction and crisis management[J].Sci Eng Ethics,2014(20):717-733.
  • 5QU Y,WU P F,WANG X.Online Community Response to Major Disaster:A Study of Tianya Forum in the 2008Sichuan Earthquake[C].Proceedings of the 42nd Hawaii International Conference on System Sciences(HICCS),2009.1423-1427.
  • 6LIU Y,YANG Y,LI L.Major natural disasters and their spatio-temporal variation in the history of China[J].J.Geogr.Sci,2012,22(6):963-976.
  • 7人民网舆情监测室.2014年中国互联网舆情分析报告.[EB/OL]http://yuqing.people.com.cn/n/2014/1231/c354318-26306123.html.2015-12-31.
  • 8ACHSAN H T Y,WIBOWO W C.A fast distributed focusedweb crawling[J].Procedia Engineering,2014,69:492-499.
  • 9WOODWARD W A,GRAY H L,ELLIOTT A C.Applied Time Series Analysis[M].Boca Raton,FL:CRC Press,2012.
  • 10GUAN Q L,YE S Z,YAO G X.Research and Design of Internet Public Opinion Analysis System[C].2009IITA International Conference on Services Science,Management and Engineering,2009.173-177.

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部