摘要
在中文微博产品评价分类算法中,由于常规SVM分类器在对少量标记数据的样本进行训练时,泛化能力无法满足要求,无法直接应用于微博文本的数据挖掘中,而传统的半监督TSVM算法的改造是通过对未标记数据增加惩罚函数完成的,这样会产生非凸函数优化问题。因此该文研究一种半监督高斯混合模型核的支持向量机分类算法。使用高斯混合模型对已标记和未标记数据进行训练,求取概率分布。最后通过一个对于i Phone手机的评价实例进行分析,验证了该文研究方法的优势。
The evaluation and classification algorithm of Chinese microblog products is studied in this paper. Because theconventional support vector machine(SVM)classifier cannot satisfy the requirement of the generalization ability when the sam?ples are trained with a small amount of labeled data,it cannot be directly applied to the data mining of the micro blog text. Andthe improvement of the traditional semi supervised TSVM algorithm is accomplished by increasing the penalty function to the un?labeled data,but this will produce a non convex function optimization problem. Therefore,a semi?supervised kernel SVM classi?fication algorithm based on Gauss mixture model is studied in this paper. The Gauss mixture model is used to train labeled andunlabeled data to obtain the probability distribution. SVM classification algorithm can make use of the clustering informationwith unlabeled data as far as possible. Finally,the advantages of this research method are verified by analyzing an example ofevaluation for iPhone mobile phone.
作者
张燕
ZHANG Yan(College of Educational Science,Xinjiang Normal University,Urumqi 830017,China)
出处
《现代电子技术》
北大核心
2016年第14期77-79,83,共4页
Modern Electronics Technique
基金
国家自然科学基金地区科学基金项目(41561100)
新疆维吾尔自治区社会科学基金一般资助项目(14BGL041)
关键词
微博
产品评价
数据挖掘
支持向量机
半监督学习
microblog
product evaluation
data mining
support vector machine
semi.supervised learning