摘要
Kad网络中存在数以亿计的共享资源,而其中有相当一部分可被评定为敏感资源。为深入了解Kad网络上资源尤其是敏感资源的特征,运用Kad网络采集器:Rainbow对节点拥有的文件资源进行探测分析。该文发现:1)文件流行度和文件所对应的文件名数量都近似符合Zipf分布;2)利用同一个"文件内容哈希"(即file-content-hash)的多个文件名的共现词可以更准确地进行敏感判别;3)敏感资源占随机样本的6.34%,且敏感资源中74.8%为video文件。
In Kad network,there are hundreds of millions of shared resources,among which a considerable part can be rated as questionable information.In order to understand the characteristics of resources,especially questionable ones,in Kad network,the file resources of peers are measured and analyzed using the Kad-network crawler Rainbow.We find that: 1) both the popularity of files and the number of filenames corresponding to a file approximately fit Zipf distribution;2) the severity of questionable files can be judged more accurately using co-occurrence-words in multiple filenames corresponding to the same file-content-hash;3) the questionable resources only occupy 6.34% of random samples,and 74.8% of which are video files.
出处
《中文信息学报》
CSCD
北大核心
2010年第6期85-91,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60803085
60873245)
国家863计划高技术研究发展计划资助项目(2006AA01Z452)
关键词
对等网络
KAD网络
探测分析
敏感资源
Peer-to-peer network
Kad network
measurement and analysis
questionable resource