摘要
目前,网络评论的情感分类研究大部分是不平衡样本数据,正向样本的数量一般远大于负向样本,对这种不平衡样本集进行分类时容易产生少数类误差较大的问题。而且由于网络评论的表达形式多变,不易获取到大量的有监督的数据。针对上述问题,对无监督的不平衡网络评论情感分类进行研究。首先通过改进降噪自动编码器,提高少数类的特征值,避免分类样本向多数类偏移。然后将获取的特征值作为k-means算法的输入值,实现了无监督的样本分类。实验证明,该算法对不平衡率较高的样本具有良好的适应性,从而验证了算法的有效性。
Currently, the network comments sentiment classification studies usually use unbalanced sample data in which the number of positive samples generally much larger than the negative sample. That imbalance sample classification is prone to minority class large error. In addition the network comments expression varied, it is difficult to get a large number of supervised data. In order to solver these problems, the Web reviews imbalance unsu- pervised sentiment classification is studied. First, through improving the Denoising Autoencoders, minority class characteristic value is increased to avoid the majority class classification sample deviation. Then the eigenvalues is put in k-means algorithm as input values to achieve unsupervised classification. Experimental results show that the algorithm has a good adaptability for higher imbalance sample data, and verify the effectiveness of the algorithm.
出处
《科学技术与工程》
北大核心
2014年第12期232-235,共4页
Science Technology and Engineering
基金
欠发达地区工业化与信息化融合及其系统动力机制研究(11FJL007)
广西教育厅人文社科研究项目(SK13YB069)资助
关键词
情感分类
深度学习
降噪自动编码器
不平衡数据
sentiment classification deep learning denoising autoencoder unbalance data