期刊文献+

Detecting Marionette Microblog Users for Improved Information Credibility 被引量:2

Detecting Marionette Microblog Users for Improved Information Credibility
原文传递
导出
摘要 In this paper, we propose to detect a special group of microblog users: the "marionette" users, who are created or employed by backstage "puppeteers", either through programs or manually. Unlike normal users that access microblog for information sharing or social communication, the marionette users perform specific tasks to earn financial profits. For example, they follow certain users to increase their "statistical popularity", or retweet some tweets to amplify their "statistical impact". The fabricated follower or retweet counts not only mislead normal users to wrong information, but also seriously impair microblog-based applications, such as hot tweets selection and expert finding. In this paper, we study the important problem of detecting marionette users on microblog platforms. This problem is challenging because puppeteers are employing complicated strategies to generate marionette users that present similar behaviors as normal users. To tackle this challenge, we propose to take into account two types of discriminative information: 1) individual user tweeting behavior and 2) the social interactions among users. By integrating both information into a semi-supervised probabilistic model, we can effectively distinguish marionette users from normal ones. By applying the proposed model to one of the most popular microblog platforms (Sina Weibo) in China, we find that the model can detect marionette users with F-measure close to 0.9. In addition, we apply the proposed model to calculate the marionette ratio of the top 200 most followed microbloggers and the top 50 most retweeted posts in Sina Weibo. To accelerate the detecting speed and reduce feature generation cost, we further propose a light-weight model which utilizes fewer features to identify marionettes from retweeters. In this paper, we propose to detect a special group of microblog users: the "marionette" users, who are created or employed by backstage "puppeteers", either through programs or manually. Unlike normal users that access microblog for information sharing or social communication, the marionette users perform specific tasks to earn financial profits. For example, they follow certain users to increase their "statistical popularity", or retweet some tweets to amplify their "statistical impact". The fabricated follower or retweet counts not only mislead normal users to wrong information, but also seriously impair microblog-based applications, such as hot tweets selection and expert finding. In this paper, we study the important problem of detecting marionette users on microblog platforms. This problem is challenging because puppeteers are employing complicated strategies to generate marionette users that present similar behaviors as normal users. To tackle this challenge, we propose to take into account two types of discriminative information: 1) individual user tweeting behavior and 2) the social interactions among users. By integrating both information into a semi-supervised probabilistic model, we can effectively distinguish marionette users from normal ones. By applying the proposed model to one of the most popular microblog platforms (Sina Weibo) in China, we find that the model can detect marionette users with F-measure close to 0.9. In addition, we apply the proposed model to calculate the marionette ratio of the top 200 most followed microbloggers and the top 50 most retweeted posts in Sina Weibo. To accelerate the detecting speed and reduce feature generation cost, we further propose a light-weight model which utilizes fewer features to identify marionettes from retweeters.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第5期1082-1096,共15页 计算机科学技术学报(英文版)
关键词 marionette microblog user information credibility fake follower fake retweet marionette microblog user; information credibility; fake follower; fake retweet
  • 相关文献

参考文献38

  • 1Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes Twit- ter users: Real-time event detection by social sensors. In Proe. the 19th International Conference on World Wide Web, April 2010, pp.851-860.
  • 2Yu L L, Asur S, Huberman B A. Artificial inflation: The real story of trends and trend-setters in Sina Weibo. In Proc. the International Conference on Privacy, Security, Risk and Trast and International Conference on Social Com- puting, September 2012, pp.514-519.
  • 3Bollen J, Mao H, Zeng X. Twitter mood pre- dicts the stock market, arXiv.1010.3003, 2010. http://arxiv.org/abs/1010.3003, June 2015.
  • 4Yang Z, Cai K, Tang J, Zhang L, Su Z, Li J. Social con- text summarization. In Proc. the 34th International ACM SIGIR Conference on Research and Development in Infor- mation Retrieval, July 2011, pp.255-264.
  • 5Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
  • 6Kang H, Wang K, Soukal D, Behr F, Zheng Z. Large-scale bot detection for search engines. In Proc. the 19th Interna- tional Conference on World Wide Web, April 2010, pp.501- 510.
  • 7Hall M, Frank E, Holmes G, Pfahringer B, Reuternann P, Witten I H. The WEKA data mining software: An update. SIGKDD Explorations, 2009, 11(1): 10-18.
  • 8Qiu X, Zhang Q, Huang X. FudanNLP: A toolkit for Chi- nese natural language processing. In Proc. the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, August 2013, pp.49-54.
  • 9Mathioudakis M, Koudas N. TwitterMonitor: Trend detec- tion over the Twitter stream. In Proc. the 2010 ACM SIG- MOD International Conference on Management of Data, June 2010, pp.1155-1158.
  • 10Yin Z, Cao L, Han J, Zhai C, Huang T. Geographical topic discovery and comparison. In Proc. the 20th International Conference on World Wide Web, March 28-April 1, 2011, pp.247-256.

同被引文献4

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部