摘要
随着信息时代的到来,互联网平台上的文本数据开始爆发式增长,其中难免夹杂着一些不法数据。这些数据往往隐藏在海量数据中,因此给平台检索这些不法数据增加了难度。在这种情况下再用传统的文本分类方法已经不能满足需求了。因此论文根据文本数据的特点提出了基于主动学习的SVM评论内容分类方法,该方法使用主动学习的思想将敏感词向量、k-means聚类算法和SVM分类算法结合在一起,在使用更少训练集的基础上提高文本分类的准确率。实验结果表明,使用论文提出的方法对文本进行分类,在分类时间和结果准确率方面上都得到了一定程度的提高。
With the advent of the information age,text data on the Internet platform has begun to explode,which is inevitably mixed with some illegal data.These data are often hidden in massive data,so it is more difficult for the platform to retrieve these illegal data.In this paper the traditional text classification method can no longer meet the needs.Therefore,this paper proposes an active learning-based SVM review content classification method based on the characteristics of text data.This method uses the idea of active learning to classify the k-means clustering algorithm and SVM.The algorithms are combined to improve the accuracy of text classification on the basis of using fewer training sets.Experimental results show that using the method proposed in this paper to classify text has achieved a certain degree of classification time and result accuracy.
作者
段友祥
张晓天
DUAN Youxiang;ZHANG Xiaotian(School of Computer Science and Technology,China University of Petroleum(East China),Qingdao 266580)
出处
《计算机与数字工程》
2022年第3期608-612,共5页
Computer & Digital Engineering