摘要
暗网中存在大量毒品、军火、货币等非法交易网站,对网络环境造成严重危害,为了对暗网进行探测和监控,提出一种基于Scrapy分布式的暗网探测爬虫方法。将暗网使用的socks5协议转化为爬虫支持的http协议,再利用Python的Scrapy爬虫框架对暗网站点进行探测和爬取。使用该方法已发现数以万计的暗网站点信息,包括网站标题、源代码、网站类型等。将暗网代理环境和Python爬虫相结合,能够让程序对暗网的站点进行探测和爬取,对暗网环境进行很好的探测和监控。
There are a large number of illegal trading websites such as drugs,arms,and currencies in the dark web,which cause serious harm to the Internet environment.In order to detect and monitor the dark web,this paper proposes a dark web detection crawling method using distributed Scrapy.The socks5 protocol used by the dark web is converted into the http protocol supported by the crawler,and then the dark website points are detected and crawled by using the Scrapy crawler framework of Python.This method finds tens of thousands of dark website point information,including website title,source code,website type,and so on.The combination of the dark web proxy environment and the Python crawler allows the program to detect and crawl the dark website,and to detect and monitor the dark web environment.
作者
余志玮
何月顺
Yu Zhiwei;He Yueshun(School of Information Engineering,East China University of Technology,Nanchang,Jiangxi 330013,China)
出处
《计算机时代》
2020年第4期21-25,共5页
Computer Era