期刊文献+

音视频双模态情感识别融合框架研究 被引量:8

Research on Audio-Visual Dual-Modal Emotion Recognition Fusion Framework
下载PDF
导出
摘要 针对双模态情感识别框架识别率低、可靠性差的问题,对情感识别最重要的两个模态语音和面部表情进行了双模态情感识别特征层融合的研究。采用基于先验知识的特征提取方法和VGGNet-19网络分别对预处理后的音视频信号进行特征提取,以直接级联的方式并通过PCA进行降维来达到特征融合的目的,使用BLSTM网络进行模型构建以完成情感识别。将该框架应用到AViD-Corpus和SEMAINE数据库上进行测试,并和传统情感识别特征层融合框架以及基于VGGNet-19或BLSTM的框架进行了对比。实验结果表明,情感识别的均方根误差(RMSE)得到降低,皮尔逊相关系数(PCC)得到提高,验证了文中提出方法的有效性。 Aiming at the problem of low recognition rate and poor reliability of dual-modal emotion recognition framework,the fusion of two most important modal speech and facial expression in dual-modal emotion recognition is studied.Feature extraction method based on prior knowledge and VGGNet-19 network are used to extract features of pre-processed audio and video signals respectively.Feature fusion is achieved by direct cascade and dimensionality reduction through PCA.BLSTM network is used to construct model to complete emotion recognition.The framework is applied to AViD-Corpus and SEMAINE databases for testing,and is compared with the traditional framework of feature level fusion of emotional recognition and the framework based on VGGNet-19 or BLSTM.The experimental results show that the Root Mean Square Error(RMSE)of emotional recognition is reduced and the Pearson Correlation Coefficient(PCC)is improved,which verifies the effectiveness of the proposed method.
作者 宋冠军 张树东 卫飞高 SONG Guanjun;ZHANG Shudong;WEI Feigao(College of Information Engineering,Capital Normal University,Beijing 100048,China)
出处 《计算机工程与应用》 CSCD 北大核心 2020年第6期140-146,共7页 Computer Engineering and Applications
基金 国家重点研发项目(No.2017YFB1400803,No.2018YFB1004103) 国家自然科学基金(No.31571563,No.61601310)
关键词 音视频 双模态 特征层融合 情感识别 BLSTM audio-visual dual-modal feature-level fusion emotion recognition BLSTM
  • 相关文献

参考文献2

二级参考文献19

  • 1Gajsek R, Struc V, Mihelic F. Multi-modal emotion recogni- tion using canonical correlations and acoustic features[C]//Pattern recognition(ICPR),2010 20th International confer- ence on. IEEE,2010:4133 - 4136.
  • 2Wang Y, Guan L, Venetsanopoulos A N. Kernel cross-mo- dal factor analysis for information fusion with application to bimodal emotion recognition[J]. Multimedia, IEEE Trans- actions on, 2012,14 (3) : 597 - 607.
  • 3Paleari M, H uet B, Chellali R. Towards multimodal emotion recognition:a new approaeh[C]//Proceedings of the ACM international conference on image and video retrieval. ACM, 2010 : 174 - 181.
  • 4Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International journal of computer vision, 2004,60(2) :91 - 110.
  • 5Yang J, Yu K, Gong Y, et al. Linear spatial pyramid matc- hing using sparse coding for image cIassification[C]//Com- puter vision and pattern recognition, CVPR 2009. IEEE Conference on. IEEE, 2009:1794- 1801.
  • 6Cortes C, Vapnik V. Support-vector networks[J]. Machine learning, 1995,20(3) .. 273 - 297.
  • 7Pao T, Chen Y, Yeh J. Emotion recognition and evaluation from mandarin speech signals[J]. International journal of innovative computing, Information and Control, 2008,4 (7) : 1695 - 1709.
  • 8Eyben F, WOllmer M, Schuller B. Opensmile. the munich versatile and fast open-source audio feature extractor [C]// Proceedings of the international conference on Multimedia. ACM,2010: 1459 - 1462.
  • 9Wright J, Yang A Y, Ganesh A, et al. Robust face recogni- tion via sparse representation[J]. Pattern analysis and ma- chine intelligence, IEEE Transactions on, 2009,31 (2) : 210 - 227.
  • 10Martin O,Kotsia I,Macq B,et al. The enterface'05 audio- visual emotion database [C]//Data engineering work- shops, 2006. Proceedings. 22nd international conference on. IEEE,2006 : 8 - 8.

共引文献13

同被引文献45

引证文献8

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部