期刊文献+

基于Scrapy分布式的暗网探测爬虫构建 被引量:8

Building dark web probe crawler with Scrapy-redis
下载PDF
导出
摘要 暗网中存在大量毒品、军火、货币等非法交易网站,对网络环境造成严重危害,为了对暗网进行探测和监控,提出一种基于Scrapy分布式的暗网探测爬虫方法。将暗网使用的socks5协议转化为爬虫支持的http协议,再利用Python的Scrapy爬虫框架对暗网站点进行探测和爬取。使用该方法已发现数以万计的暗网站点信息,包括网站标题、源代码、网站类型等。将暗网代理环境和Python爬虫相结合,能够让程序对暗网的站点进行探测和爬取,对暗网环境进行很好的探测和监控。 There are a large number of illegal trading websites such as drugs,arms,and currencies in the dark web,which cause serious harm to the Internet environment.In order to detect and monitor the dark web,this paper proposes a dark web detection crawling method using distributed Scrapy.The socks5 protocol used by the dark web is converted into the http protocol supported by the crawler,and then the dark website points are detected and crawled by using the Scrapy crawler framework of Python.This method finds tens of thousands of dark website point information,including website title,source code,website type,and so on.The combination of the dark web proxy environment and the Python crawler allows the program to detect and crawl the dark website,and to detect and monitor the dark web environment.
作者 余志玮 何月顺 Yu Zhiwei;He Yueshun(School of Information Engineering,East China University of Technology,Nanchang,Jiangxi 330013,China)
出处 《计算机时代》 2020年第4期21-25,共5页 Computer Era
关键词 暗网 代理环境 Scrapy爬虫框架 站点 dark web proxy environment Scrapy crawler framework website
  • 相关文献

参考文献5

二级参考文献35

  • 1孙玲,潘京.“暗网”:互联网世界的灰色地带[J].国外科技动态,2005(12):36-39. 被引量:3
  • 2M K Bergman, The Deep Web: Surfacing Hidden Val- ueEJ~. Journal of Electronic Publishing, 2001,7(1) [-DB/OL~ DOI: http ://dx. doi. orgll0. 399813336451. 0007. 104.
  • 3K C C Chang, B He, C Li, et al. Structured databases on the web: Observations and implicationsER~. ACM SIGMOD Record, 2004,33(3) : 61-70.
  • 4B. He, M Patel, et al. , Accessing the deep web: A Survey[-C~//Proceedings of the Communications of the ACM, 2007, 50(5): 94-101.
  • 5S Raghavan, H Garcia-Molina. Crawling the Hidden WebEC~//Proceedings of 27th VLDB. 2001 .. 129-138.
  • 6P Wu, J R Wen, H Liu, et al. Query selection tech- niques for efficient crawling of structured web sources ~C~//Proceedings of the 22nd International Conference on Data Engineering. 2006.- 47-56.
  • 7M A lvarez, J Raposo, F Cacheda, et al., A Task- specific approach for crawling the deep web[-J~. Jour- nal Engineering Letters. Special Issue: Advances in Information Engineering, 2006, 13(2) : 204-215.
  • 8M A Lvarez, J Raposo, A Pan, et al. DeepBot~ a fo- cused crawler for accessing hidden web content[C~// Proceedings of the ACM Conference on Electronic Commerce. 2007:18-25.
  • 9J Madhavan, D Ko9 L Kot, et al. Google's deep web crawl[-J~. VLDB Endowment, 2008,1(2) : 1241-1252.
  • 10L Jiang, Z Wu, Q Zheng, et al. Learning deep webcrawling with diverse features EC~//Proeeedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technolo- gies. 2009: 572-575.

共引文献33

同被引文献33

引证文献8

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部