期刊文献+

基于音视频的情感识别方法研究 被引量:6

Method Research on Multimodal Emotion Recognition Based on Audio and Video
下载PDF
导出
摘要 近年来,情感计算逐渐成为人机交互发展突破的关键,而情感识别作为情感计算的重要部分,也受到了广泛的关注。本文实现了基于ResNet18的面部表情识别系统和基于HGFM架构的语音情感识别模型,通过调整参数,训练出了性能较好的模型。在此基础上,通过特征级融合和决策级融合这两种多模态融合策略,实现了包含视频和音频信号的多模态情感识别系统,展现了多模态情感识别系统性能的优越性。两种不同融合策略下的音视频情感识别模型相比视频模态和音频模态,在准确率上都有一定的提升,验证了多模态模型往往比最优的单模态模型的识别性能更好的结论。本文所实现的模型取得了较好的情感识别性能,融合后的音视频双模态模型的准确率达到了76.84%,与现有最优模型相比提升了3.50%,在与现有的音视频情感识别模型的比较中具有性能上的优势。 In recent years,affective computing has gradually become one of the keys to the development of human-computer interaction.Emotion recognition,as an important part of affective computing,has also received extensive attention.Residual network is one of the most widely used networks and HGFM has better accuracy and robustness.This paper implemented facial expression recognition system based on ResNet18 and speech emotion recognition model based on HGFM.By adjusting the parameters,the model with better performance was trained.On this basis,we realized the multimodal system included video and audio by multimodal fusion strategies,namely feature-level fusion and decision-level fusion.It showed the superiority of the multimodal emotion recognition system performance.The feature-level fusion spliced the features of visual and audio mode into a large feature vector and then sent it into the classifier for classification and recognition.For the decision-level fusion,after the prediction probability of visual and audio mode was obtained through classifiers,the weight of each mode and the fusion strategy were determined according to the reliability of each mode,and the classification result was obtained after fusion.It was found that both two audio-visual emotion recognition models using different fusion strategies had improvements in accuracy compared with video modal model and audio modal model.The conclusion that the multimodal model is better than the optimal single-mode model was verified.The accuracy of the fused audio-visual bimodal model reached 76.84%,which was 3.50%higher than the existing optimal model.The model achieved in this paper has better performance in emotion recognition and has advantages in performance compared with the existing audio-visual emotion recognition models.
作者 林淑瑞 张晓辉 郭敏 张卫强 王贵锦 LIN Shurui;ZHANG Xiaohui;GUO Min;ZHANG Weiqiang;WANG Guijin(Beijing National Research Center for Information Science and Technology,Department of Electronic Engineering,Tsinghua University,Beijing 100084,China;Shenzhen International Graduate School,Tsinghua University,Shenzhen,Guangdong 518055,China;School of Electronic and Information Engineering,Beijing Jiaotong University,Beijing 100044,China)
出处 《信号处理》 CSCD 北大核心 2021年第10期1889-1898,共10页 Journal of Signal Processing
基金 NSFC-通用技术基础研究联合基金重点项目(U1836219)。
关键词 情感识别 深度学习 多模态融合 残差网络 分层粒度和特征模型 emotion recognition deep learning multimodal fusion residual network hierarchical grained and feature model
  • 相关文献

参考文献3

二级参考文献65

  • 1孙皓莹,蒋静坪.基于参数估计的多传感器数据融合[J].传感器技术,1995,14(6):32-36. 被引量:34
  • 2罗跃嘉,黄宇霞,李新影,李雪冰.情绪对认知加工的影响:事件相关脑电位系列研究[J].心理科学进展,2006,14(4):505-510. 被引量:75
  • 3刘晓旻,谭华春,章毓晋.人脸表情识别研究的新进展[J].中国图象图形学报,2006,11(10):1359-1368. 被引量:61
  • 4刘玉娟,方富熹.情绪的语音交流[J].中国行为医学科学,2007,16(4):374-376. 被引量:4
  • 5胡广书.数字信号处理[M].北京:清华大学出版社,2007.
  • 6James W. What is an emotion [J]. Mind, 1884, 9(34) : 188 - 205.
  • 7Lange CG. The emotions: a psychophysiological study [ J ]. The emotions, 1885, 33 - 90.
  • 8Cannon W. The James-Lunge theory of emotions: a critical examination and an alternative theory [ J]. American Journal of Psychology, 1927, 39:106 - 124.
  • 9James W, Papez MD, A proposed mechanism of emotion [ J]. Arch Neural Psychiatry, 1937, 38(4) : 725 -743.
  • 10Paul D, Maclean MD. Psychosomatic disease and the " Visceral Brain"-recent developments bearing on the Papez theory of emotion [ J]. Psychosomatic Medicine, 1949, 11:338-353.

共引文献98

同被引文献74

引证文献6

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部