摘要
微博上存在大量垃圾评论,这些垃圾评论会带来不良影响,如何识别垃圾评论就成为人们关注的热门。本文针对监督学习框架下大规模标注数据集难以获得和垃圾评论识别不精准的问题,提出基于半监督协同训练算法的微博垃圾评论识别方法。该方法从评论文本和评论用户两个视图构建指标体系,每一个视图用7种分类方法挑选出基分类器进行协同训练,以完成对微博垃圾评论的识别。实验结果表明,协同训练算法有更好的识别性能。
There are a lot of spam comments on microblog,which will bring adverse effects. How to identify spam comments has become a hot topic. In this paper,a semi supervised collaborative training algorithm is proposed to solve the problem that it is difficult to obtain large-scale annotated data sets under the framework of supervised learning and the identification of spam comments is not accurate. This method constructs an index system from two views of comment text and comment user. Each view uses seven classification methods to select the base classifier for collaborative training to complete the recognition of microblog spam comments.The experimental results show that the cooperative training algorithm has better recognition performance.
作者
曹春萍
杨青林
CAO Chunping;YANG Qinglin(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《智能计算机与应用》
2020年第10期105-107,111,共4页
Intelligent Computer and Applications
基金
国家自然科学基金(61803264)。
关键词
微博垃圾评论
半监督
协同训练
分类器
Microblogging spam comment
Semi-supervised
Co-training
Classifier