摘要
文本情感分类通过对带有情感色彩的主观性文本进行分析和推理,帮助用户更好地做出判断与决策。针对传统情感分类模型难以根据上下文信息调整词向量的问题,提出一种双通道文本情感分类模型。利用ELMo和Glove预训练模型分别生成动态和静态词向量,通过堆叠嵌入2种词向量生成输入向量。采用自注意力机制处理输入向量,计算内部的词依赖关系。构建融合卷积神经网络(CNN)和双向门控递归单元(BiGRU)的双通道神经网络结构,同时获取文本局部特征和全局特征。最终将双通道处理结果进行拼接,经过全连接层处理后输入分类器获得文本情感分类结果。实验结果表明,与同类情感分类模型中性能较优的H-BiGRU模型相比,ELMo-CNN-BiGRU模型在IMDB、yelp和sentiment140数据集上的准确率和F1值分别提升了2.42、1.98、2.52和2.40、1.94、2.43个百分点,具有更好的短文本情感分类效果和稳定性。
Text sentiment classification helps users make better decisions by analyzing and reasoning subjective texts with emotional colors.Addressing the difficulty in adjusting the word vector according to the context information in traditional sentiment classification models,a dual-channel text sentiment classification method is proposed.To begin,pretrained ELMo and Glove models are used to generate dynamic and static word vectors,respectively.The input vector is generated by stacking and embedding two-word vectors.Second,the self-attention mechanism is used to process the input vector and calculate the internal word dependencies.The dual-channel neural network structure is constructed by a Convolutional Neural Network(CNN)and Bi-directional Gated Recurrent Unit(BiGRU).The local and global features of the text can be obtained simultaneously.Finally,the dual-channel processing results are spliced,processed by the fully connected layer,and sent to the classifier.The classification results can be obtained.The results show that,compared with the H-BiGRU model with the best performance among contrastive sentiment classification models,the accuracy of the proposed ELMo-CNN-BiGRU model on the IMDB,yelp,and sentiment140 datasets improved by 2.42,1.98,2.52,respectively,and the F1 value improved by 2.40,1.94,2.43 percentage points,respectively.It achieves a better sentiment classification effect and stability for short texts.
作者
吴迪
王梓宇
赵伟超
WU Di;WANG Ziyu;ZHAO Weichao(College of Information and Electrical Engineering,Hebei University of Engineering,Handan,Hebei 056038,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2022年第8期105-112,共8页
Computer Engineering
基金
国家重点研发计划“科技冬奥”重点专项子课题“冬奥会公共安全综合风险评估技术”(2018YFF0301004-02)
河北省自然科学基金“面向微博短文本的主题模型聚类方法研究”(F2020402003)
河北省自然科学基金“面向大数据应用的云计算中心性能分析与预测方法研究”(F2019402428)
河北省高等学校科学技术研究重点项目“增量序列模式匹配下网络入侵检测方法研究”(ZD2018087)。
关键词
文本情感分类
双通道
预训练模型
深度学习
自注意力机制
text sentiment classification
dual channel
pretrained model
deep learning
self-attention mechanism