摘要
随着社交网络平台上带有情绪色彩的短文本信息爆发式的增长,利用深度学习方法对此类文本进行细粒度情感分类已成为近年自然语言处理的热点研究领域。传统的情感分类算法处理细粒度情感分类时,往往存在文本语义特征信息提取不足和忽略数据集先验分布的问题。对此,该文从模型和数据两个角度进行改善,提出一种融合交替归一化的深度学习模型,模型结合RoBERTa预训练和双向长短时记忆网络深层次提取文本的全局语义特征信息并利用自注意力机制进行动态权重调整,将输出的样本结果按熵值判定为高置信度样本与低置信度样本,利用交替归一化方法与数据集先验分布信息,对低置信度样本进行逐一修正,得到最终结果。在SMP2020-EWWCT中文竞赛数据集和SemEval 2014 task4英文公开数据集上进行跨语言对比实验评估,结果表明,该模型相较主流深度学习模型均有显著的性能提升。
With the explosive growth of emotional information on the online social media,deep learning approach to fine-grained sentiment classification has become a hotspot in natural language processing in recent years.Most existing researches has the problems of insufficient semantic information extraction and ignorance of the data distribution.This paper proposes a deep learning model which fused with alternating normalization and RoBERTa-based Bi-directional long short-term memory and Attention.The model combines Roberta,Bi-LSTM and Attention mechanism to extract the semantic feature information of the text.The output confidences are determined by entropy.The low confidence samples are modified by using the alternating normalization method and the prior distribution information of the dataset.The experimental results on SMP2020-EWWCT Chinese competition dataset and Semeval 2014 task4 English public dataset show that the model has a significant improvement compared with the baselines.
作者
周艳玲
兰正寅
张䶮
刘司摇
ZHOU Yanling;LAN Zhengyin;ZHANG Yan;LIU Siyao(School of Artificial Intelligence,Hubei University,Wuhan,Hubei 430062,China)
出处
《中文信息学报》
CSCD
北大核心
2023年第9期140-149,共10页
Journal of Chinese Information Processing
基金
国家自然科学基金(61977021)
湖北省教育厅科学技术研究项目(D20221006)。
关键词
情感分类
深度学习
交替归一化
sentiment classification
deep learning
alternating normalization