微博中基于统计特征与双向投票的垃圾用户发现被引量：11

Detecting Spammers with a Bidirectional Vote Algorithm Based on Statistical Features in Microblogs

下载PDF

导出

摘要传统微博中垃圾用户发现主要依靠用户的显示统计特征.针对微博中关注网络的有向特性,给出了有向网络中局部三角形数量统计算法DirTriangleC,结合用户博文数量和局部三角形比例发现隐式垃圾用户;针对统计特征方法对垃圾用户误报和漏报的缺点,提出了基于统计特征与双向投票算法AttriBiVote,利用用户信任的双向传播与其邻居节点的统计特征共同决定用户类别.真实的Twitter数据集上验证了DirTriangleC和AttriBiVote算法的有效性,结果表明DirTriangleC算法能够发现约83.7%的"完全非活跃"状态的隐式垃圾用户,相对依靠显示统计特征方法增加了约2倍数量的疑似垃圾用户;同时AttriBiVote算法发现垃圾用户的数量和准确性均高于依靠统计特征的垃圾用户发现方法;最后实验分析了AttriBiVote算法的时间开销. The existing work mainly focuses on spammers detection in microblogs based on explicit features, such as the interval of tweets, the ratio of mentions in tweets, the ratio of URLs in tweets, and so on. In this paper, the DirTriangleC algorithm which counts local triangles is developed in order to detect the implicit spammers, based on the following directed network. Moreover, the AttriBiVote algorithm, which classifies users by the bidirectional propagation of the trust and statistical features of neighbors＇ users, is put forward. Experiments are conducted on a real dataset from Twitter containing about 0.26 million users and 10 million tweets, and experimental results show that the method in this paper is more effective than other methods of statistical features. About 83.7 % of dead accounts are disco＇vered by the DirTriangleC algorithm, and the number of potential spammers by the DirTriangleC algorithm is about treble others＇ by explicit features. Moreover, the number of spammers by the AttriBiVote algorithm is more than that of approximation spammers by statistical features. And the precision of our method is higher than that of the methods by the interval of tweets, the ratio of mentions in tweets, and the ratio of URLs in tweets. Finally, the time cost of our method is analyzed.

作者丁兆云周斌贾焰汪祥

机构地区国防科学技术大学信息系统与管理学院国防科学技术大学信息系统工程重点实验室国防科学技术大学计算机学院

出处《计算机研究与发展》 EI CSCD 北大核心 2013年第11期2336-2348,共13页 Journal of Computer Research and Development

基金国家自然科学基金项目(60933005 91124002 61302144) 国家"八六三"高技术研究发展计划基金项目(2010AA012505 2011AA010702 2012AA01A401 2012AA01A402) 国家科技支撑计划基金项目(2012BAH38B04 2012BAH38B06) 国家"二四二"信息安全计划基金项目(2011A010) 国家"九七三"重点基础研究发展计划基金项目(2013CB329601 2013CB329601)

关键词垃圾用户信任传播三角形统计微博社会网络 spammer trust propagation triangle counting microblog social networks

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献20

1中国互联网络信息中心.中国互联网络发展状况统计报告[EB/OL].http://www.cnnic net.cn,2003—07-01.
2Kwak H. Lee C. Park H. et al. What is twitter. a social network or a news media? [C] / /Proc of the 19th Int World Wide Web Conf. New York, ACM. 2010, 591-600.
3Yin D. Hong L. Xiong X. et al. Link formation analysis in microblogs [C] / /Proc of the 34th Annual Int ACM SIGIR Conf on Information Retrieval. New York, ACM. 2011, 1235-1236.
4Becchetti L. Boldi P. Castillo C. er al. Efficient semistreaming algorithms for local triangle counting in massive graphs [C] / /Proc of the 14th ACM SIGKDD Int Conf On Knowledge Discovery and Data Mining. New York, ACM. 2008, 16-24.
5Tsourakakis C. Fast counting of triangles in large real networks without counting, Algorithms and laws [C] / /Proc of the 8th IEEE Int Conf on Data Mining. Piscataway. NJ, IEEE. 2008, 608-617.
6Gyongyi Z, Garcia-Molina H. Pedersen J. Combating Web sparn with TrustRank [C] / /Proc of the 30th Int Conf on Very Large Data Bases. San Franciso . Morgan Kaufmann, 2004, 576-587.
7Sobek M. PRO-Google's PageRank 0 penalty [EB/OL]. (2003-01-31) [2012-07-28]. http://pr. efactory. dele-prO. shtml.
8Wu B. Goel V. Davison B. Propagating trust and distrust to demote Web sparn [C] / /Proc of Models of Trust for the Web Workshop of 15th Int World Wide Web Conf. New York, ACM. 2006, 29-37.
9Chu Z. Gianvecchio S. Wang H. et al. Who is tweeting on twitter, Human. bot. or cyborg? [C] / /Proc of the 26th Annual Computer Security Applications Conf. New York, ACM. 2010, 21-30.
10Stringhini G. Kruegel C, Vigna G. Detecting spamrners on social networks [C] / /Proc of the 26th Annual Computer Security Applications Conf. New York, ACM. 2010, 1-9.

二级参考文献11

1Yu Jiang,Bin-Xing Fang,Ming-Zeng Hu,Xiang Cui.Techniques for Determining the Geographic Location of IP Addresses in ISP Topology Measurement[J].Journal of Computer Science & Technology,2005,20(5):689-701. 被引量：2
2L Pelletier,J Almhana,V Choulakian.Adaptive filtering of SPAM[A].Communication Networks and Services Research Proceedings[C].USA:IEEE Computer Society Press,2004.218-224.
3Ion Androutsopoulos,John Koutsias,Konstantinos V Chandrinos,Constantine D Spyropoulos.An experimental comparison of Naive Bayesian and keyword-based anti-spam filtering with personal E-mail message[A].Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C].New York:ACM Press,2000.160-167.
4Mehran Sahami,Susan Dumais,David Heckerman,Eric Horvitz.A bayesian approach to filtering junk E-mail[A].Learning for Text Categorization[C].Madison Wisconsin:AAAI Press,1998.55-62.
5Jongsub Moon,Taeshik Shon,Jungtaek Seo,Jongho Kim,Jungwoo Seo.An approach for spam E-mail detection with support vector machine and n-Gram indexing[A].Lecture Notes in Computer Science[C].Heidelberg Germany:Springer-Verlag GmbH,2004.351-362.
6Harris Drucker,Donghui Wu,Vladimir N Vapnik.Support vector machines for spam categorization[J].Neural Networks,IEEE Trans,1999,10(5):1048-1054.
7Andrew Secker,Alex A Freitas,Jon Timmis.AISEC:an artificial immune system for E-mail classification[A].The 2003 Congress on Evolutionary Computation[C].California USA:IEEE Computer Society Press,2003.131-138.
8Terri Oda,Tony White.Increasing the accuracy of a spam-detecting artificial immune system[A].The 2003 Congress on Evolutionary Computation[C].California USA:IEEE Computer Society Press,2003.390-396.
9Terri Oda,Tony White.Developing an immunity to spam[A].Lecture Notes in Computer Science[C].Heidelberg Germany:Springer-Verlag GmbH,2003.231-242.
10Yukun Cao,Xiaofeng Liao,Yunfeng Li.An E-mail filtering approach using neural network[A].Lecture Notes in Computer Science[C].Heidelberg Germany:Springer-Verlag GmbH,2004.688-694.

共引文献124

1白瑜.网络电视运营体系的研究[J].宁夏工程技术,2006,5(2):183-186.
2张泽明,罗文坚,王煦法.基于免疫原理的个性化Spam过滤算法[J].模式识别与人工智能,2007,20(3):406-414. 被引量：2
3郭丽英.网络聊天对大学生思想政治教育的影响及对策[J].学校党建与思想教育,2007(9):45-46. 被引量：3
4王伟赟,张寒生.和谐社会的信息生态构建研究[J].情报理论与实践,2007,30(6):728-730. 被引量：10
5胡立源.e时代大学生网上开店的利与弊[J].商场现代化,2008(15):138-138. 被引量：7
6郭春甫,定明捷.网络民意影响公共政策构建的力度和效度[J].宁夏大学学报（人文社会科学版）,2008,30(1):163-166. 被引量：7
7吴晓奕.论个人电子文件的保护[J].科技情报开发与经济,2008,18(13):146-148. 被引量：2
8吴华瑞,李美英,赵春江,朱华吉,朱成礼,杨宝祝.基于人工免疫系统的RFID数据过滤模型研究[J].计算机应用研究,2008,25(6):1776-1778.
9朱金娥,刘永福.学校教育网站建设的现状调查分析[J].中国电化教育,2008(12):63-66. 被引量：4
10季志.电子政务视阈下降低政府行政成本的路径探析[J].湖州师范学院学报,2008,30(6):41-44.

同被引文献144

1颜跃进,李舟军,陈火旺.一种挖掘最大频繁项集的深度优先算法[J].计算机研究与发展,2005,42(3):462-467. 被引量：20
2韩家炜.数据挖掘:概念与技术[M].3版北京:机械工业出版社,2012.
3中国互联网网络信息中心.第33次中国互联网发展状况调查统计报告[R/OL].(2014-03-05)【2014-07-01].http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201403/t20140305-46240.htm.
4Yardi S, Romero D, Schoenebeck G. Detecting spam in a twitter network. First Monday, 2009, 15(1): 1-13.
5Stringhini G, Kruegel C, Vigna G. Detectingspammers on social networks // Proceedings 26th Annual Computer Security Applications ference. New York: ACM, 2010:1-9 of the Con-.
6Thomas K, Grier C, Song D, et al. Suspended accounts in retrospect: an analysis of twitter spare // Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement. New York: ACM, 2011 243-258.
7Zhang X, Zhu S, Liang W. Detecting spam and promoting campaigns in the twitter social network // Proceedings of the 2012 IEEE 12th International Conference on Data Mining. Brussels: IEEE Com- puter Society, 2012:1194-1199.
8Lee K, Eoff B D, Caverlee J. Seven months with the devils: a long-term study of content polluters on Twitter // AAAI Conference on Weblogs and Social Media (ICWSM). Barcelona, 2011 : 185-192.
9Yang C, Harkreader R C, Gu G. Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers // Recent advances in intrusion detection. Berlin: Springer, 2011:318-337.
10Shen Yang, Li Shuchen, Ye Xiaoxiao, et al. Content mining and network analysis of microblog spam. Journal of Convergence Information Technology, 2010, 5(1): 135-140.

引证文献11

1刘勘,袁蕴英,刘萍.基于随机森林分类的微博机器用户识别研究[J].北京大学学报（自然科学版）,2015,51(2):289-300. 被引量：19
2张进,刘琰,罗军勇,董雨辰.基于特征分析的微博炒作账户识别方法[J].计算机工程,2015,41(4):48-54. 被引量：3
3刘晶,王峰,胡亚慧,李石君.基于微博行为数据的不活跃用户探测[J].电子科技大学学报,2015,44(3):410-414. 被引量：2
4张宇翔,孙菀,杨家海,周达磊,孟祥飞,肖春景.新浪微博反垃圾中特征选择的重要性分析[J].通信学报,2016,37(8):24-33. 被引量：8
5张晓艺,路燕,翟惠良.基于AHP与SVM的微博机器用户检测方法[J].计算机工程,2017,34(4):171-176. 被引量：1
6王振飞,刘凯莉,郑志蕴,王飞.面向时间序列的微博话题演化模型研究[J].计算机科学,2017,44(8):270-273. 被引量：6
7赵星宇,赵志宏,王业沛,陈松宇.基于聚类分析的微博广告发布者识别[J].计算机应用,2018,38(5):1267-1271. 被引量：2
8刘琰,张进,陈静,尹美娟,张伟丽.基于最大频繁项集挖掘的微博炒作群体发现方法[J].计算机工程与应用,2017,53(4):90-97.
9杨晓晖,刘晓明.基于双向邻居修正的局部异常因子算法[J].通信学报,2020,41(8):130-140. 被引量：4
10张瑶瑶,朱小栋.基于岭回归极限学习机的微博垃圾用户分类[J].计算机与数字工程,2021,49(11):2326-2330. 被引量：1

二级引证文献47

1杨清,李元歌.基于改进AHP-模糊算法的Android系统安全威胁评估[J].湖南科技大学学报（自然科学版）,2018,33(4):105-112. 被引量：2
2戴晓露,吴薇,黄蓓雯,吕敏.基于大数据分析的高压用户峰谷电量优化研究[J].自动化技术与应用,2019,38(1):54-56. 被引量：3
3王恩贤,陶宏才.基于PSO-SVM算法的炒作微博识别研究[J].成都信息工程学院学报,2015,30(6):529-535. 被引量：1
4牛振军,周红.新媒体对建设护理文化的意义与启示[J].护理实践与研究,2016,13(16):16-18.
5金丹,滕洁琪.基于机器学习的微博机器用户识别研究[J].中国高新技术企业,2016(30):4-7. 被引量：2
6姜赢,何国东,郭雨宸,朱玲萱.高校区域大学生微博身份的精确识别方法[J].计算机系统应用,2017,26(1):206-211.
7张艳梅,黄莹莹,甘世杰,丁熠,马志龙.基于贝叶斯模型的微博网络水军识别算法研究[J].通信学报,2017,38(1):44-53. 被引量：37
8黄发良,冯时,王大玲,于戈.基于多特征融合的微博主题情感挖掘[J].计算机学报,2017,40(4):872-888. 被引量：62
9余以胜.基于随机森林的用户行为识别模型研究[J].电脑知识与技术,2017,13(3):156-157. 被引量：4
10张晓艺,路燕,翟惠良.基于AHP与SVM的微博机器用户检测方法[J].计算机工程,2017,34(4):171-176. 被引量：1

1赵海燕,熊波,陈庆奎,曹健.基于信任传播的概率矩阵分解算法[J].小型微型计算机系统,2016,37(5):895-901. 被引量：6
2张明杰,康宝生.一种基于图模型的粒子滤波跟踪方法[J].计算机应用研究,2016,33(2):590-593. 被引量：5
3刘江宁,吴泉源.基于格结构假设空间证据推理的双向传播计算模型[J].国防科技大学学报,1993,15(3):90-95.
4冯文惠.基于用户类别兴趣度的协同过滤推荐算法[J].河北软件职业技术学院学报,2015,17(3):23-25. 被引量：1
5张晓艳.RBAC模型在物资管理系统中的应用[J].电脑知识与技术,2008,3(11):997-999. 被引量：2
6申华.一种对抗社交网络链接作弊的算法[J].计算机与现代化,2015(7):1-4. 被引量：1
7张中峰,李秋丹.社交网站中潜在好友推荐模型研究[J].情报学报,2011,30(12):1319-1325. 被引量：24
8高琳,唐鹏,盛鹏.基于概率图模型目标建模的视觉跟踪算法[J].光电子．激光,2010,21(1):124-129. 被引量：4
9陈蕴.低质量指纹图像的增强算法研究[J].阜阳师范学院学报（自然科学版）,2002,19(4):5-9.
10吴晓璇,倪志伟,倪丽萍.云计算环境下基于分形的聚类融合算法研究[J].计算机工程与应用,2015,51(14):1-6. 被引量：5

计算机研究与发展

2013年第11期

浏览历史

内容加载中请稍等...

微博中基于统计特征与双向投票的垃圾用户发现被引量：11

参考文献20

二级参考文献11

共引文献124

同被引文献144

引证文献11

二级引证文献47

相关作者

相关机构

相关主题

浏览历史

微博中基于统计特征与双向投票的垃圾用户发现 被引量：11

参考文献20

二级参考文献11

共引文献124

同被引文献144

引证文献11

二级引证文献47

相关作者

相关机构

相关主题

浏览历史

微博中基于统计特征与双向投票的垃圾用户发现被引量：11