期刊文献+

基于协同训练算法的微博垃圾评论识别

Microblogging Spam Recognition Based on Co-Training Algorithm
下载PDF
导出
摘要 微博上存在大量垃圾评论,这些垃圾评论会带来不良影响,如何识别垃圾评论就成为人们关注的热门。本文针对监督学习框架下大规模标注数据集难以获得和垃圾评论识别不精准的问题,提出基于半监督协同训练算法的微博垃圾评论识别方法。该方法从评论文本和评论用户两个视图构建指标体系,每一个视图用7种分类方法挑选出基分类器进行协同训练,以完成对微博垃圾评论的识别。实验结果表明,协同训练算法有更好的识别性能。 There are a lot of spam comments on microblog,which will bring adverse effects. How to identify spam comments has become a hot topic. In this paper,a semi supervised collaborative training algorithm is proposed to solve the problem that it is difficult to obtain large-scale annotated data sets under the framework of supervised learning and the identification of spam comments is not accurate. This method constructs an index system from two views of comment text and comment user. Each view uses seven classification methods to select the base classifier for collaborative training to complete the recognition of microblog spam comments.The experimental results show that the cooperative training algorithm has better recognition performance.
作者 曹春萍 杨青林 CAO Chunping;YANG Qinglin(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处 《智能计算机与应用》 2020年第10期105-107,111,共4页 Intelligent Computer and Applications
基金 国家自然科学基金(61803264)。
关键词 微博垃圾评论 半监督 协同训练 分类器 Microblogging spam comment Semi-supervised Co-training Classifier
  • 相关文献

参考文献5

二级参考文献136

  • 1郭红刚,方敏.AdaBoost方法在入侵检测技术上的应用[J].计算机应用,2005,25(1):144-146. 被引量:6
  • 2KOLARI P, JAVA A, FININ T, et al. Detecting spare blogs: a ma- chine learning approach [C]// AAAI '06: Proceedings of the 21st National Conference on Artificial Intelligence. [ S. I. ] : AAAI Press, 2006, 2:1351 - 1356.
  • 3NTOULAS A, NAJORK M, MANASSE M, et al. Detecting spare Web pages through content analysis [ C]// WWW '06: Proceedings of the 15th International Conference on World Wide Web. New York: ACM, 2006:83-92.
  • 4BHATTARAI A, RUS V, DASGUPTA D. Characterizing comment spare in the blogosphere through content analysis [ C]// CICS '09: Proceedings of IEEE Symposium on Computational Intelligence in Cyber Security. Piscataway: IEEE, 2009:37-44.
  • 5FREUND Y, SCHAPIRE R, ABE N. A short introduction to boos- ting . Journal of Japanese Society for Artificial Intelhgence, 1999, 14(5): 771-780.
  • 6BOYARSHINOV V, MAGDON-ISMAIL M. Efficient optimal line- ar boosting of a pair of classifiers[ J]. IEEE Transactions on Neural Networks, 2007, 18(2) : 317 -328.
  • 7Semiocast , Twitter reaches half a billion accounts more than 140 million in the U. S [EB/OL]. (2012-07-30)[2013-07- 23]. http://semiocast. com/publications/2012_07 _30_ Twitter_ reaches_halCa_billion_accounts_140m_in_the_ US.
  • 8Kwak H, Lee C, Park H, et al. What is Twitter, A social network or a news media [C] //Proc of the 19th Int Conf on World Wide Web (WWW·10). New York: ACM, 2010: 591-600.
  • 9Comscore. Mobile driving majority of growth for leading EU5 social networks [EB/OLJ. (2012-05-18) [2013-07- 23]. http://www.comscoredatamine.com/2012/05/mobile_ driving , majority _ f _ growth _ for _ leading _ eu5 _ social _ networks.
  • 10Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes Twitter users: Real-time event detection by social sensors [C] //Proc of the 19th Int Conf on World Wide Web (WWW·10). New York: ACM, 2010: 851-860.

共引文献133

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部