摘要
传统微博中垃圾用户发现主要依靠用户的显示统计特征.针对微博中关注网络的有向特性,给出了有向网络中局部三角形数量统计算法DirTriangleC,结合用户博文数量和局部三角形比例发现隐式垃圾用户;针对统计特征方法对垃圾用户误报和漏报的缺点,提出了基于统计特征与双向投票算法AttriBiVote,利用用户信任的双向传播与其邻居节点的统计特征共同决定用户类别.真实的Twitter数据集上验证了DirTriangleC和AttriBiVote算法的有效性,结果表明DirTriangleC算法能够发现约83.7%的"完全非活跃"状态的隐式垃圾用户,相对依靠显示统计特征方法增加了约2倍数量的疑似垃圾用户;同时AttriBiVote算法发现垃圾用户的数量和准确性均高于依靠统计特征的垃圾用户发现方法;最后实验分析了AttriBiVote算法的时间开销.
The existing work mainly focuses on spammers detection in microblogs based on explicit features, such as the interval of tweets, the ratio of mentions in tweets, the ratio of URLs in tweets, and so on. In this paper, the DirTriangleC algorithm which counts local triangles is developed in order to detect the implicit spammers, based on the following directed network. Moreover, the AttriBiVote algorithm, which classifies users by the bidirectional propagation of the trust and statistical features of neighbors' users, is put forward. Experiments are conducted on a real dataset from Twitter containing about 0.26 million users and 10 million tweets, and experimental results show that the method in this paper is more effective than other methods of statistical features. About 83.7 % of dead accounts are disco'vered by the DirTriangleC algorithm, and the number of potential spammers by the DirTriangleC algorithm is about treble others' by explicit features. Moreover, the number of spammers by the AttriBiVote algorithm is more than that of approximation spammers by statistical features. And the precision of our method is higher than that of the methods by the interval of tweets, the ratio of mentions in tweets, and the ratio of URLs in tweets. Finally, the time cost of our method is analyzed.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2013年第11期2336-2348,共13页
Journal of Computer Research and Development
基金
国家自然科学基金项目(60933005
91124002
61302144)
国家"八六三"高技术研究发展计划基金项目(2010AA012505
2011AA010702
2012AA01A401
2012AA01A402)
国家科技支撑计划基金项目(2012BAH38B04
2012BAH38B06)
国家"二四二"信息安全计划基金项目(2011A010)
国家"九七三"重点基础研究发展计划基金项目(2013CB329601
2013CB329601)
关键词
垃圾用户
信任传播
三角形统计
微博
社会网络
spammer trust propagation triangle counting microblog social networks