摘要
文本情感分析是自然语言处理的典型任务,但是现有情感分析正确率不高,其中词的特征化是一个重要原因。本文提出了一种短文本特征的组合加权方法(a Combined Weighting method for Short Text Features,CWSTF),可以有效提高情感分析正确率。CWSTF方法以随机森林为基础评估特征对于情感的贡献度并排序,进而依排序来进行特征选择。然后考虑特征在文档中的重要性TF-IDF(Term Frequency-Inverse Document Frequency),以特征在文档中的重要性和情感贡献度确定该特征的权重。最后,用支持向量SVM(Support Vector Machine)、朴素贝叶斯NB(Naive Bayes)、最大熵ME(Maximum Entropy)、K最近邻KNN(K-Nearest Neighbor)等分类器进行比较实验,实验结果表明采用本文方法处理的特征,比其余方法能有效提高情感分类正确率。
Text sentiment analysis is a typical task of natural language processing,but the accuracy of existing sentiment analysis is not high,and word characterization is an important reason.A combined weighting method for short text features(CWSTF)is proposed,which can effectively improve the accuracy of sentiment analysis.The CWSTF method evaluates the contribution of features to emotions based on random forests and ranks them,and then filters features based on ranks.Then,the importance of the feature in the document is calculated by TF-IDF(Term Frequency-Inverse Document Frequency),and the final weight of the feature is determined by the importance of the feature in the document and the contribution to the sentiment;Finally,four such classifiers as SVM(Support Vector Machine),NB(Naive Bayes),ME(Maximum Entropy),and KNN(K-Nearest Neighbor)are used for comparison experiments.The experimental results show that the features processed by proposed method can more effectively improve the accuracy of sentiment classification than other methods.
作者
谭有新
滕少华
Tan You-xin;Teng Shao-hua(School of Computers,Guangdong University of Technology,Guangzhou 510006,China)
出处
《广东工业大学学报》
CAS
2020年第5期51-61,共11页
Journal of Guangdong University of Technology
基金
国家自然科学基金资助项目(61972102)
广东省科技计划项目(2016B010108007,2019B110210002,2019B020208001)
广东省教育厅项目(粤教高函〔2018〕179号)
广州市科技计划项目(201802010042,201802030011,201802010026,201903010107)。
关键词
情感分析
特征选择
组合加权
sentiment analysis
feature selection
combined weighting