摘要
情感分类旨在对文本所表达的情感色彩进行分类。为了提高句子级别的情感分类能力,提出一种基于异构分类器集成学习的方法,主要采用2种文本特征提取的3种主流情感分类集成方法:基于词嵌入的长短期记忆网络(long short term memory,LSTM)、基于词嵌入的卷积神经网络(convolutional neural networks,CNN)和基于词频-逆文档频率(term frequency–inverse document frequency,TF-IDF)的逻辑回归(logistic regression,LR)方法。为了避免传统投票法的单一性和软投票法不能应用在异构分类器上进行集成学习分类的问题,采用回归学习多分类器输出标签方法,从而使各分类器的数据价值最大化,进而有利于对文本所表达的情感色彩实现分类。实验结果表明,本文提出的方法比3种主流方法在准确率和F1值上分别提高了4.7%和5.2%且优于传统投票法。
The purpose of sentiment classification is to classify the sentiment color expressed in the text.In order to improve the ability of sentence level sentiment classification,this paper proposes a method based on the ensemble learning of heterogeneous classifiers,which mainly uses three mainstream sentiment classification ensemble methods:long and short term memory(LSTM) and convolution neural networks(CNN) based on word embedding and logistic regression(LR) based on term frequency-inverse document frequency(TF-IDF) of two kinds of text feature extraction.In order to avoid the problem that the traditional voting method is simple and soft voting method cannot be applied to heterogeneous classifiers for ensemble learning classification,the output label method of multiple classifiers based on regression learning is used to maximize the data value of each classifier,which is conducive to the classification of the sentiment color expressed by the text.The experimental results show that the accuracy and F1 values of the method proposed in this paper are 4.7% and5.2%,higher than those of the three mainstream methods,and are better than those of the traditional voting methods.
作者
李迪
计春雷
刘松
LI Di;JI Chunlei;LIU Song(College of Information Technology,Shanghai Ocean University,Shanghai 201306,China;Intelligent Equipment Software and Big Data Analysis Laboratory,Shanghai DianJi University,Shanghai 201306,China)
出处
《武汉大学学报(工学版)》
CAS
CSCD
北大核心
2021年第10期975-982,共8页
Engineering Journal of Wuhan University
基金
国家自然科学基金青年基金项目(编号:61702320)。
关键词
情感分类
异构分类器
集成学习
sentiment classification
heterogeneous classifier
ensemble learning