期刊文献+

融合音频、文本、表情动作的多模态情感识别 被引量:7

Multi-modal Emotion Recognition Using Speech,Text and Motion
下载PDF
导出
摘要 针对机器识别人类情感过程中的精度不高、泛化能力不强等问题,提出了一种基于语音、文本和表情动作的3种模态情感识别融合方法。在语音模态中,设计深度波场延拓和改进波动物理模型,模拟长短期记忆(long short-term memory,LSTM)网络的序列信息挖掘过程;在文本模态中,利用含有多头注意力机制的Transformer模型捕捉语义上潜在的情感表达;在表情动作模态中,将提取面部表情和手部动作的序列特征与双向三层含有注意力机制的LSTM模型相结合。最终提出一种多性能指标下的模态融合方案,以实现高精度的、强泛化能力的情感识别。在通用的交互式情感二元运动捕捉语料库IEMOCAP中,将所提出的方法与现有的情感识别算法进行对比,实验结果表明:所提出的算法在单个模态和多个模态中的识别精度均较高,平均精度改善达到16.4%和10.5%,有效提升了人机交互中情感识别的能力。 For the problems of low accuracy and weak generalization ability in the process of human emotion recognition,a fusion method of multi-modal emotion recognition based on speech,text and motion is proposed.In the speech mode,a depth wavefield extrapolation-improved wave physics model(DWE-WPM)is designed to simulate the sequence information mining process of long short-term memory(LSTM)network;In the text mode,a transformer model with multi-attention mechanism is used to capture the potential semantic expression of emotion;In the motion mode,sequential features of facial expression and hand action are combined by using two-way three-layer LSTM model with attention mechanism.Accordingly,a multi-modal fusion scheme is designed to achieve high-precision and strong generalization ability of emotion recognition.In the general emotion corpus IEMOCAP,the method proposed in this paper is compared with existing emotion recognition algorithms.Experimental results show that the proposed method has higher recognition accuracy both in single modality and multi-modals,with average accuracy improved by 16.4%and 10.5%respectively,effectively improving the ability of human emotion recognition in human-computer interaction.
作者 贾宁 郑纯军 JIA Ning;ZHENG Chunjun(School of Software,Dalian Neusoft University of Information,Dalian 116023,Liaoning,China)
出处 《应用科学学报》 CAS CSCD 北大核心 2023年第1期55-70,共16页 Journal of Applied Sciences
基金 国家重点研发计划项目基金(No.2021YFC3320300) 辽宁省教育厅项目基金(No.LJKQZ2021188,No.JG20DB032)资助。
关键词 语音情感识别 文本情感识别 动作情感识别 Transformer模型 注意力机制 speech emotion recognition text emotion recognition motion emotion recognition Transformer model attention mechanism
  • 相关文献

参考文献6

二级参考文献26

共引文献37

同被引文献100

引证文献7

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部