Detecting Marionette Microblog Users for Improved Information Credibility 被引量：2

Detecting Marionette Microblog Users for Improved Information Credibility

导出

摘要 In this paper, we propose to detect a special group of microblog users： the ＂marionette＂ users, who are created or employed by backstage ＂puppeteers＂, either through programs or manually. Unlike normal users that access microblog for information sharing or social communication, the marionette users perform specific tasks to earn financial profits. For example, they follow certain users to increase their ＂statistical popularity＂, or retweet some tweets to amplify their ＂statistical impact＂. The fabricated follower or retweet counts not only mislead normal users to wrong information, but also seriously impair microblog-based applications, such as hot tweets selection and expert finding. In this paper, we study the important problem of detecting marionette users on microblog platforms. This problem is challenging because puppeteers are employing complicated strategies to generate marionette users that present similar behaviors as normal users. To tackle this challenge, we propose to take into account two types of discriminative information： 1） individual user tweeting behavior and 2） the social interactions among users. By integrating both information into a semi-supervised probabilistic model, we can effectively distinguish marionette users from normal ones. By applying the proposed model to one of the most popular microblog platforms （Sina Weibo） in China, we find that the model can detect marionette users with F-measure close to 0.9. In addition, we apply the proposed model to calculate the marionette ratio of the top 200 most followed microbloggers and the top 50 most retweeted posts in Sina Weibo. To accelerate the detecting speed and reduce feature generation cost, we further propose a light-weight model which utilizes fewer features to identify marionettes from retweeters. In this paper, we propose to detect a special group of microblog users： the ＂marionette＂ users, who are created or employed by backstage ＂puppeteers＂, either through programs or manually. Unlike normal users that access microblog for information sharing or social communication, the marionette users perform specific tasks to earn financial profits. For example, they follow certain users to increase their ＂statistical popularity＂, or retweet some tweets to amplify their ＂statistical impact＂. The fabricated follower or retweet counts not only mislead normal users to wrong information, but also seriously impair microblog-based applications, such as hot tweets selection and expert finding. In this paper, we study the important problem of detecting marionette users on microblog platforms. This problem is challenging because puppeteers are employing complicated strategies to generate marionette users that present similar behaviors as normal users. To tackle this challenge, we propose to take into account two types of discriminative information： 1） individual user tweeting behavior and 2） the social interactions among users. By integrating both information into a semi-supervised probabilistic model, we can effectively distinguish marionette users from normal ones. By applying the proposed model to one of the most popular microblog platforms （Sina Weibo） in China, we find that the model can detect marionette users with F-measure close to 0.9. In addition, we apply the proposed model to calculate the marionette ratio of the top 200 most followed microbloggers and the top 50 most retweeted posts in Sina Weibo. To accelerate the detecting speed and reduce feature generation cost, we further propose a light-weight model which utilizes fewer features to identify marionettes from retweeters.

作者吴贤范伟高晶冯子明俞勇

机构地区 Department of Computer Science Baidu Research Big Data Laboratory Department of Coraputer Science and Engineering

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第5期1082-1096,共15页 计算机科学技术学报（英文版）

关键词 marionette microblog user information credibility fake follower fake retweet marionette microblog user； information credibility； fake follower； fake retweet

分类号 TP316.81 [自动化与计算机技术—计算机软件与理论] TN929.533 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献38

1Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes Twit- ter users: Real-time event detection by social sensors. In Proe. the 19th International Conference on World Wide Web, April 2010, pp.851-860.
2Yu L L, Asur S, Huberman B A. Artificial inflation: The real story of trends and trend-setters in Sina Weibo. In Proc. the International Conference on Privacy, Security, Risk and Trast and International Conference on Social Com- puting, September 2012, pp.514-519.
3Bollen J, Mao H, Zeng X. Twitter mood pre- dicts the stock market, arXiv.1010.3003, 2010. http://arxiv.org/abs/1010.3003, June 2015.
4Yang Z, Cai K, Tang J, Zhang L, Su Z, Li J. Social con- text summarization. In Proc. the 34th International ACM SIGIR Conference on Research and Development in Infor- mation Retrieval, July 2011, pp.255-264.
5Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
6Kang H, Wang K, Soukal D, Behr F, Zheng Z. Large-scale bot detection for search engines. In Proc. the 19th Interna- tional Conference on World Wide Web, April 2010, pp.501- 510.
7Hall M, Frank E, Holmes G, Pfahringer B, Reuternann P, Witten I H. The WEKA data mining software: An update. SIGKDD Explorations, 2009, 11(1): 10-18.
8Qiu X, Zhang Q, Huang X. FudanNLP: A toolkit for Chi- nese natural language processing. In Proc. the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, August 2013, pp.49-54.
9Mathioudakis M, Koudas N. TwitterMonitor: Trend detec- tion over the Twitter stream. In Proc. the 2010 ACM SIG- MOD International Conference on Management of Data, June 2010, pp.1155-1158.
10Yin Z, Cao L, Han J, Zhai C, Huang T. Geographical topic discovery and comparison. In Proc. the 20th International Conference on World Wide Web, March 28-April 1, 2011, pp.247-256.

同被引文献4

1任亚峰,姬东鸿,张红斌,尹兰.基于PU学习算法的虚假评论识别研究[J].计算机研究与发展,2015,52(3):639-648. 被引量：30
2郭霖珂,张驰,方玉光,林风.A Privacy-Preserving Attribute-Based Reputation System in Online Social Networks[J].Journal of Computer Science & Technology,2015,30(3):578-597. 被引量：5
3刘建伟,刘媛,罗雄麟.半监督学习方法[J].计算机学报,2015,38(8):1592-1617. 被引量：131
4刘欣,佘贤栋,唐永旺,王波.基于特征词向量的短文本聚类算法[J].数据采集与处理,2017,32(5):1052-1060. 被引量：9

引证文献2

1张璐.虚假商品评论识别的研究与进展[J].计算机工程,2019,45(10):293-300. 被引量：1
2Hao Liao,Qi-Xin Liu,Ze-Cheng Huang,Ke-Zhong Lu,Chi Ho Yeung,Yi-Cheng Zhang.Accumulative Time Based Ranking Method to Reputation Evaluation in Information Networks[J].Journal of Computer Science & Technology,2022,37(4):960-974.

二级引证文献1

1张运良,丁思媛,高雄.突发事件评论集中的情报甄别方法初探[J].情报工程,2020,6(2):21-35. 被引量：3

1亚飞.如何处长数码相机寿命[J].上海轻工业,2004,34(1):30-31.
2于莹莹.设备投资辅助决策系统研究[J].哈尔滨铁道科技,2013(2):1-2.
3徐桂林.如何高效使用EC2之服务安全与成本控制[J].程序员,2014(6):103-107.
4李宝方.VC中报表的多种实现方法研究[J].焦作大学学报,2007,21(1):82-83.
5周鑫.Wi-Fi的应用与安全[J].网络安全技术与应用,2014(2):129-130.
6邱亮.基于单目计算机视觉的目标跟踪与识别方法讨论[J].电脑编程技巧与维护,2012(10):76-77.
7仇善梁.基于.NET技术与Excel文件的工资条邮件系统的设计与实现[J].电脑知识与技术,2013,9(11X):7436-7437.
8刘丹.电子通信关键技术的应用及网络构架展望[J].科技视界,2014(8):55-55. 被引量：8
9刘雯婧.时间停留在那一瞬间[J].特区教育（小学生）,2012(10):27-27.
10张树忠.UPS故障维修二例[J].电脑爱好者,2004(19):80-80.

Journal of Computer Science & Technology

2015年第5期

浏览历史

内容加载中请稍等...

Detecting Marionette Microblog Users for Improved Information Credibility 被引量：2

参考文献38

同被引文献4

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史