摘要
新兴暗网零网(ZeroNet)是一种使用比特币加密技术和比特流(BitTorrent)协议构建的抗审查P2P网络,用户规模不断扩大。鉴于ZeroNet去中心化、抗审查等暗网的技术特点,从ZeroNet的体系结构分析入手,基于模拟登陆的方式设计并开发了ZeroNet文本抓取系统;采用半监督隐狄利克雷分布(Latent Dirichlet Allocation,LDA)主题模型针对ZeroNet网站中占比最大的博客和论坛文本数据中的中英文的文本数据进行建模分析;进一步,对比无监督LDA主题模型,文中采用的半监督LDA主题模型具有更好的分类结果,对于ZeroNet网站新内容的监控具有实践意义。
The emerging darknet ZeroNet is an anti-censorship P2P network built using Bitcoin encryption technology and BitTorrent protocol,and the scale of users continues to expand.In view of the technical characteristics of ZeroNet decentralization and anti-review,the ZeroNet text crawling system is designed and developed based on the simulation of ZeroNet.The semi-supervised Latent Dirichlet Allocation(LDA)topic model is used to model and analyze the text data of Chinese and English in the most popular blog and forum text data on the ZeroNet website.Further,compared with the unsupervised LDA topic model,the semi-supervised LDA used in this paper has better classification results and is practical for monitoring the new content of the ZeroNet website.
作者
过小宇
丁建伟
江泓
陈周国
GUO Xiao-yu;DING Jian-wei;JIANG Hong;CHEN Zhou-guo(Science and Technology on Communication Security Laboratory,Chengdu 610000,China)
出处
《信息技术》
2020年第3期32-38,共7页
Information Technology
基金
国家重点研发计划资助(2016YFE0206700)
关键词
零网
暗网
主题模型
文本分类
ZeroNet
darknet
topic model
text classification