摘要
为了解决错别字、语法错误、网络文化特殊用词等引起的噪声干扰,本文研究多模态融合的情感识别方法,提出一种基于模态融合的情感识别网络模型。首先,提取3种模态特征,使多模态数据之间的格式统一并对齐;其次,为了挖掘各模态之间的关联关系,融合文本、音频与视频3个模态的特征,根据提取的融合特征间的互补信息解决噪声干扰问题;在此基础上,利用注意力机制与双向循环神经网络进一步充分捕获融合特征及不同情感话语中的上下文信息,得到更加丰富的融合特征表示;最后,搭建下游任务模块,利用丰富的融合特征表示,提升下游任务情感识别的识别效果。利用本文所提出的网络模型分别在3个数据集上进行了实验,实验结果表明多模态比单一模态效果更好,基于模态融合的情感识别网络在识别性能上有较好的表现,本文结论可用于指导话语情感识别过程。
In order to solve various noise interference such as typos,grammatical errors,and special words of network culture,this paper studies the emotional recognition method of multi-modal fusion,and proposes an emotional recognition network model based on modal fusion.Firstly,three modal features are extracted to unify and align the formats between multimodal data.And then,in order to mine the relationship between the modalities,the features of the three modalities of text,audio and video are fused,and thereby,the noise interference problem is solved according to the complementary information between the extracted fusion features.On this basis,the attention mechanism and the bidirectional recurrent neural network are used to further fully capture the fusion features and the context information in different emotional discourses,obtaining a richer fusion feature representation.Finally,the downstream task module is built,using rich fusion feature representation to improve the recognition effect of downstream task emotion recognition.Experiments have been carried out on three datasets using the network model proposed in this paper.The experimental results show that the multi-modal effect is better than the single-modal effect,and the emotion recognition network based on modal fusion has better performance in recognition performance.
作者
文培煜
聂国豪
王兴梅
吴沛然
WEN Peiyu;NIE Guohao;WANG Xingmei;WU Peiran(College of Computer Science and Technology,Harbin Engineering University,Harbin 150001,China;National Key Laboratory of Underwater Acoustic Technology,Harbin Engineering University,Harbin 150001,China)
出处
《应用科技》
CAS
2024年第1期51-58,97,共9页
Applied Science and Technology
基金
重点实验室开放基金项目(KY10600220048)。
关键词
深度学习
情感识别
多模态
多模态融合
循环神经网络
双向门控网络
全连接神经网络
注意力机制
deep learning
emotion recognition
multimodal
multimodal fusion
recurrent neural network
bi-directional gated recurrent unit
fully connected neural network
attention mechanism