基于协同训练算法的微博垃圾评论识别

Microblogging Spam Recognition Based on Co-Training Algorithm

下载PDF

导出

摘要微博上存在大量垃圾评论,这些垃圾评论会带来不良影响,如何识别垃圾评论就成为人们关注的热门。本文针对监督学习框架下大规模标注数据集难以获得和垃圾评论识别不精准的问题,提出基于半监督协同训练算法的微博垃圾评论识别方法。该方法从评论文本和评论用户两个视图构建指标体系,每一个视图用7种分类方法挑选出基分类器进行协同训练,以完成对微博垃圾评论的识别。实验结果表明,协同训练算法有更好的识别性能。 There are a lot of spam comments on microblog,which will bring adverse effects. How to identify spam comments has become a hot topic. In this paper,a semi supervised collaborative training algorithm is proposed to solve the problem that it is difficult to obtain large-scale annotated data sets under the framework of supervised learning and the identification of spam comments is not accurate. This method constructs an index system from two views of comment text and comment user. Each view uses seven classification methods to select the base classifier for collaborative training to complete the recognition of microblog spam comments.The experimental results show that the cooperative training algorithm has better recognition performance.

作者曹春萍杨青林 CAO Chunping;YANG Qinglin(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)

机构地区上海理工大学光电信息与计算机工程学院

出处《智能计算机与应用》 2020年第10期105-107,111,共4页 Intelligent Computer and Applications

基金国家自然科学基金(61803264)。

关键词微博垃圾评论半监督协同训练分类器 Microblogging spam comment Semi-supervised Co-training Classifier

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献5

1姚子瑜,屠守中,黄民烈,朱小燕.一种半监督的中文垃圾微博过滤方法[J].中文信息学报,2016,30(5):176-186. 被引量：8
2李志欣,兰丹媚,张灿龙,唐素勤.基于Co-Training的微博垃圾评论识别方法[J].计算机工程,2018,44(7):212-218. 被引量：3
3黄铃,李学明.基于AdaBoost的微博垃圾评论识别方法[J].计算机应用,2013,33(12):3563-3566. 被引量：6
4丁兆云,贾焰,周斌.微博数据挖掘研究综述[J].计算机研究与发展,2014,51(4):691-706. 被引量：120
5杜茂康,叶琪.基于PCA与协同训练算法的虚假评论识别研究[J].计算机仿真,2019,36(2):452-457. 被引量：4

二级参考文献136

1郭红刚,方敏.AdaBoost方法在入侵检测技术上的应用[J].计算机应用,2005,25(1):144-146. 被引量：6
2KOLARI P, JAVA A, FININ T, et al. Detecting spare blogs: a ma- chine learning approach [C]// AAAI '06: Proceedings of the 21st National Conference on Artificial Intelligence. [ S. I. ] : AAAI Press, 2006, 2:1351 - 1356.
3NTOULAS A, NAJORK M, MANASSE M, et al. Detecting spare Web pages through content analysis [ C]// WWW '06: Proceedings of the 15th International Conference on World Wide Web. New York: ACM, 2006:83-92.
4BHATTARAI A, RUS V, DASGUPTA D. Characterizing comment spare in the blogosphere through content analysis [ C]// CICS '09: Proceedings of IEEE Symposium on Computational Intelligence in Cyber Security. Piscataway: IEEE, 2009:37-44.
5FREUND Y, SCHAPIRE R, ABE N. A short introduction to boos- ting . Journal of Japanese Society for Artificial Intelhgence, 1999, 14(5): 771-780.
6BOYARSHINOV V, MAGDON-ISMAIL M. Efficient optimal line- ar boosting of a pair of classifiers[ J]. IEEE Transactions on Neural Networks, 2007, 18(2) : 317 -328.
7Semiocast , Twitter reaches half a billion accounts more than 140 million in the U. S [EB/OL]. (2012-07-30)[2013-07- 23]. http://semiocast. com/publications/2012_07 _30_ Twitter_ reaches_halCa_billion_accounts_140m_in_the_ US.
8Kwak H, Lee C, Park H, et al. What is Twitter, A social network or a news media [C] //Proc of the 19th Int Conf on World Wide Web (WWW·10). New York: ACM, 2010: 591-600.
9Comscore. Mobile driving majority of growth for leading EU5 social networks [EB/OLJ. (2012-05-18) [2013-07- 23]. http://www.comscoredatamine.com/2012/05/mobile_ driving , majority _ f _ growth _ for _ leading _ eu5 _ social _ networks.
10Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes Twitter users: Real-time event detection by social sensors [C] //Proc of the 19th Int Conf on World Wide Web (WWW·10). New York: ACM, 2010: 851-860.

共引文献133

1张辉,何庆勇,惠小珊,但文超,孟培培.蒲辅周先生治疗湿证用药规律的数据挖掘研究[J].世界科学技术-中医药现代化,2021,23(9):3195-3201. 被引量：1
2孙晓燕,乔娅利.基于迁移与半监督共生融合的虚假评论识别[J].南京大学学报（自然科学版）,2022,58(5):846-855.
3张振华,吴开超.基于Twitter的流感疫情可视化系统[J].计算机系统应用,2015,24(3):69-74. 被引量：6
4赵小明,张群,岳昆.基于静电场理论和PageRank算法的微博用户相关性分析[J].云南大学学报（自然科学版）,2015,37(2):207-214. 被引量：2
5吕琳,刘培玉.一种基于C4.5决策树算法的Web页面分类算法[J].山东师范大学学报（自然科学版）,2015,30(2):20-23. 被引量：1
6张振华,吴开超.一种分布式Twitter数据处理方案及应用[J].计算机应用研究,2015,32(7):2073-2077. 被引量：3
7丛颖,刘其成,张伟.一种基于Apriori的微博推荐并行算法[J].计算机应用与软件,2015,32(8):229-233. 被引量：2
8彭敏,黄佳佳,朱佳晖,黄济民,刘纪平.基于频繁项集的海量短文本聚类与主题抽取[J].计算机研究与发展,2015,52(9):1941-1953. 被引量：31
9王巍,党甄甄,刘美爽.数据挖掘技术在食品配餐中的应用[J].美食研究,2015,32(3):33-36. 被引量：1
10唐浩浩,席耀一,周杰,郭志刚,陈刚.基于维基知识的微博事件追踪方法[J].计算机应用与软件,2015,32(10):21-25. 被引量：1

1乔良才.结合多分辨率表示和复数域CNN的SAR图像目标识别方法[J].激光与光电子学进展,2020,57(24):90-98. 被引量：3

智能计算机与应用

2020年第10期

浏览历史

内容加载中请稍等...

基于协同训练算法的微博垃圾评论识别

参考文献5

二级参考文献136

共引文献133

相关作者

相关机构

相关主题

浏览历史