期刊文献+

微博中基于统计特征与双向投票的垃圾用户发现 被引量:11

Detecting Spammers with a Bidirectional Vote Algorithm Based on Statistical Features in Microblogs
下载PDF
导出
摘要 传统微博中垃圾用户发现主要依靠用户的显示统计特征.针对微博中关注网络的有向特性,给出了有向网络中局部三角形数量统计算法DirTriangleC,结合用户博文数量和局部三角形比例发现隐式垃圾用户;针对统计特征方法对垃圾用户误报和漏报的缺点,提出了基于统计特征与双向投票算法AttriBiVote,利用用户信任的双向传播与其邻居节点的统计特征共同决定用户类别.真实的Twitter数据集上验证了DirTriangleC和AttriBiVote算法的有效性,结果表明DirTriangleC算法能够发现约83.7%的"完全非活跃"状态的隐式垃圾用户,相对依靠显示统计特征方法增加了约2倍数量的疑似垃圾用户;同时AttriBiVote算法发现垃圾用户的数量和准确性均高于依靠统计特征的垃圾用户发现方法;最后实验分析了AttriBiVote算法的时间开销. The existing work mainly focuses on spammers detection in microblogs based on explicit features, such as the interval of tweets, the ratio of mentions in tweets, the ratio of URLs in tweets, and so on. In this paper, the DirTriangleC algorithm which counts local triangles is developed in order to detect the implicit spammers, based on the following directed network. Moreover, the AttriBiVote algorithm, which classifies users by the bidirectional propagation of the trust and statistical features of neighbors' users, is put forward. Experiments are conducted on a real dataset from Twitter containing about 0.26 million users and 10 million tweets, and experimental results show that the method in this paper is more effective than other methods of statistical features. About 83.7 % of dead accounts are disco'vered by the DirTriangleC algorithm, and the number of potential spammers by the DirTriangleC algorithm is about treble others' by explicit features. Moreover, the number of spammers by the AttriBiVote algorithm is more than that of approximation spammers by statistical features. And the precision of our method is higher than that of the methods by the interval of tweets, the ratio of mentions in tweets, and the ratio of URLs in tweets. Finally, the time cost of our method is analyzed.
出处 《计算机研究与发展》 EI CSCD 北大核心 2013年第11期2336-2348,共13页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60933005 91124002 61302144) 国家"八六三"高技术研究发展计划基金项目(2010AA012505 2011AA010702 2012AA01A401 2012AA01A402) 国家科技支撑计划基金项目(2012BAH38B04 2012BAH38B06) 国家"二四二"信息安全计划基金项目(2011A010) 国家"九七三"重点基础研究发展计划基金项目(2013CB329601 2013CB329601)
关键词 垃圾用户 信任传播 三角形统计 微博 社会网络 spammer trust propagation triangle counting microblog social networks
  • 相关文献

参考文献20

  • 1中国互联网络信息中心.中国互联网络发展状况统计报告[EB/OL].http://www.cnnic net.cn,2003—07-01.
  • 2Kwak H. Lee C. Park H. et al. What is twitter. a social network or a news media? [C] / /Proc of the 19th Int World Wide Web Conf. New York, ACM. 2010, 591-600.
  • 3Yin D. Hong L. Xiong X. et al. Link formation analysis in microblogs [C] / /Proc of the 34th Annual Int ACM SIGIR Conf on Information Retrieval. New York, ACM. 2011, 1235-1236.
  • 4Becchetti L. Boldi P. Castillo C. er al. Efficient semistreaming algorithms for local triangle counting in massive graphs [C] / /Proc of the 14th ACM SIGKDD Int Conf On Knowledge Discovery and Data Mining. New York, ACM. 2008, 16-24.
  • 5Tsourakakis C. Fast counting of triangles in large real networks without counting, Algorithms and laws [C] / /Proc of the 8th IEEE Int Conf on Data Mining. Piscataway. NJ, IEEE. 2008, 608-617.
  • 6Gyongyi Z, Garcia-Molina H. Pedersen J. Combating Web sparn with TrustRank [C] / /Proc of the 30th Int Conf on Very Large Data Bases. San Franciso . Morgan Kaufmann, 2004, 576-587.
  • 7Sobek M. PRO-Google's PageRank 0 penalty [EB/OL]. (2003-01-31) [2012-07-28]. http://pr. efactory. dele-prO. shtml.
  • 8Wu B. Goel V. Davison B. Propagating trust and distrust to demote Web sparn [C] / /Proc of Models of Trust for the Web Workshop of 15th Int World Wide Web Conf. New York, ACM. 2006, 29-37.
  • 9Chu Z. Gianvecchio S. Wang H. et al. Who is tweeting on twitter, Human. bot. or cyborg? [C] / /Proc of the 26th Annual Computer Security Applications Conf. New York, ACM. 2010, 21-30.
  • 10Stringhini G. Kruegel C, Vigna G. Detecting spamrners on social networks [C] / /Proc of the 26th Annual Computer Security Applications Conf. New York, ACM. 2010, 1-9.

二级参考文献11

  • 1Yu Jiang,Bin-Xing Fang,Ming-Zeng Hu,Xiang Cui.Techniques for Determining the Geographic Location of IP Addresses in ISP Topology Measurement[J].Journal of Computer Science & Technology,2005,20(5):689-701. 被引量:2
  • 2L Pelletier,J Almhana,V Choulakian.Adaptive filtering of SPAM[A].Communication Networks and Services Research Proceedings[C].USA:IEEE Computer Society Press,2004.218-224.
  • 3Ion Androutsopoulos,John Koutsias,Konstantinos V Chandrinos,Constantine D Spyropoulos.An experimental comparison of Naive Bayesian and keyword-based anti-spam filtering with personal E-mail message[A].Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C].New York:ACM Press,2000.160-167.
  • 4Mehran Sahami,Susan Dumais,David Heckerman,Eric Horvitz.A bayesian approach to filtering junk E-mail[A].Learning for Text Categorization[C].Madison Wisconsin:AAAI Press,1998.55-62.
  • 5Jongsub Moon,Taeshik Shon,Jungtaek Seo,Jongho Kim,Jungwoo Seo.An approach for spam E-mail detection with support vector machine and n-Gram indexing[A].Lecture Notes in Computer Science[C].Heidelberg Germany:Springer-Verlag GmbH,2004.351-362.
  • 6Harris Drucker,Donghui Wu,Vladimir N Vapnik.Support vector machines for spam categorization[J].Neural Networks,IEEE Trans,1999,10(5):1048-1054.
  • 7Andrew Secker,Alex A Freitas,Jon Timmis.AISEC:an artificial immune system for E-mail classification[A].The 2003 Congress on Evolutionary Computation[C].California USA:IEEE Computer Society Press,2003.131-138.
  • 8Terri Oda,Tony White.Increasing the accuracy of a spam-detecting artificial immune system[A].The 2003 Congress on Evolutionary Computation[C].California USA:IEEE Computer Society Press,2003.390-396.
  • 9Terri Oda,Tony White.Developing an immunity to spam[A].Lecture Notes in Computer Science[C].Heidelberg Germany:Springer-Verlag GmbH,2003.231-242.
  • 10Yukun Cao,Xiaofeng Liao,Yunfeng Li.An E-mail filtering approach using neural network[A].Lecture Notes in Computer Science[C].Heidelberg Germany:Springer-Verlag GmbH,2004.688-694.

共引文献124

同被引文献144

  • 1颜跃进,李舟军,陈火旺.一种挖掘最大频繁项集的深度优先算法[J].计算机研究与发展,2005,42(3):462-467. 被引量:20
  • 2韩家炜.数据挖掘:概念与技术[M].3版北京:机械工业出版社,2012.
  • 3中国互联网网络信息中心.第33次中国互联网发展状况调查统计报告[R/OL].(2014-03-05)【2014-07-01].http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201403/t20140305-46240.htm.
  • 4Yardi S, Romero D, Schoenebeck G. Detecting spam in a twitter network. First Monday, 2009, 15(1): 1-13.
  • 5Stringhini G, Kruegel C, Vigna G. Detectingspammers on social networks // Proceedings 26th Annual Computer Security Applications ference. New York: ACM, 2010:1-9 of the Con-.
  • 6Thomas K, Grier C, Song D, et al. Suspended accounts in retrospect: an analysis of twitter spare // Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement. New York: ACM, 2011 243-258.
  • 7Zhang X, Zhu S, Liang W. Detecting spam and promoting campaigns in the twitter social network // Proceedings of the 2012 IEEE 12th International Conference on Data Mining. Brussels: IEEE Com- puter Society, 2012:1194-1199.
  • 8Lee K, Eoff B D, Caverlee J. Seven months with the devils: a long-term study of content polluters on Twitter // AAAI Conference on Weblogs and Social Media (ICWSM). Barcelona, 2011 : 185-192.
  • 9Yang C, Harkreader R C, Gu G. Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers // Recent advances in intrusion detection. Berlin: Springer, 2011:318-337.
  • 10Shen Yang, Li Shuchen, Ye Xiaoxiao, et al. Content mining and network analysis of microblog spam. Journal of Convergence Information Technology, 2010, 5(1): 135-140.

引证文献11

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部