摘要
针对文本情感分类中情感语义特征利用不足、特征降维效果欠佳等影响分类效果的问题,提出了一种通过扩展语义相似的情感词以及引入词语间统计特征的高精度网络评论情感分类方法.该方法利用神经网络Skipgram模型生成词嵌入,通过词嵌入相似性度量将语义相似的词语扩展为情感特征;再利用词语间的统计特征进行特征降维;通过多个弱分器加权构建Adaboost分类模型实现网络评论情感分类.基于酒店评论和手机评论公开测试集进行实验,结果表明其情感分类的正确率分别达到90.96%和93.67%.方法扩展语义相似情感词有利于丰富文本情感语义特征,引入词语间的统计特征有更好的特征降维效果,可以进一步提升文本情感分类的效果.
To solve the effect problem of sentiment classification due to the insufficient use of emotional semantic features and unpromising dimension reduction effects,a novel high-precision sentiment classification method was proposed in this paper for online comments by extending semantic similar emotional words and employing the statistical features between words.Firstly,a neural network skip-gram model was employed to generate word embedding and extend the semantic similar words to emotional feature by the measure of embedding word similarity.Then the feature dimension was reduced by employing the statistical features between words.At last,sentiment classification of online comments was carried out by the Adaboost classification model which was constructed by weighting multiple weak classifiers.Experiment results on hotel reviews and mobile comments show that,the accuracy of sentiment classification with new method can reach 90.96%and 93.67%respectively.Expanding semantic similarity emotion words is helpful to enrich the semantic features of emotion.Employing statistical features between words has better feature reduction effect.Both two procedures effectively improve the performance of text sentiment classification.
作者
罗森林
毛焱颖
潘丽敏
陈倩柔
魏超
LUO Sen-lin;MAO Yan-ying;PAN Li-min;CHEN Qian-rou;WEI Cao(Information System and Security&Countermeasures Experimental Center,Beijing Institute of Technology, Beijing 100081,China)
出处
《北京理工大学学报》
EI
CAS
CSCD
北大核心
2018年第11期1156-1162,1176,共8页
Transactions of Beijing Institute of Technology
基金
北京理工大学基础研究基金资助项目(20160542013)
国家"二四二"计划项目(2017A149)
关键词
词嵌入
Adaboost分类模型
特征选择
中文评论
情感分类
word embeding
Adaboost classification model
feature selection
Chinese comment
sentiment classification