期刊文献+

基于主辅网络特征融合的语音情感识别 被引量:8

Feature Fusion Based on Main-Auxiliary Network for Speech Emotion Recognition
下载PDF
导出
摘要 为了有效特征提取与融合提高语音情感识别率,提出了一种使用主辅网络进行深度特征融合的语音情感识别算法。首先将段特征输入BLSTM-Attention网络作为主网络,其中注意力机制能够关注语音信号中的情感信息;然后,把Mel语谱图输入CNN-GAP网络作为辅助网络,GAP可以减轻全连接层带来的过拟合;最后,将两个网络提取的深度特征以主辅网络方式进行特征融合,解决不同类型特征直接融合带来的识别结果不理想的问题。在IEMOCAP数据集上对比4种模型的实验结果表明,使用主辅网络深度特征融合的WA和UA均有不同程度的提高。 Speech emotion recognition is an important research direction of human-computer interaction.Effective feature extraction and fusion are among the key factors to improve the rate of speech emotion recognition.In this paper,a speech emotion recognition algorithm using Main-auxiliary networks for deep feature fusion was proposed.First,segment features are input into BLSTM-attention network as the main network.The attention mechanism can pay attention to the emotion information in speech signals.Then,the Mel spectrum features are input into Convolutional Neural Networks-Global Average Pooling(GAP)as auxiliary network.GAP can reduce the overfitting brought by the fully connected layer.Finally,the two are combined in the form of Main-auxiliary networks to solve the problem of unsatisfactory recognition results caused by direct fusion of different types of features.The experimental results of comparing four models on IEMOCAP dataset show that WA and UA using the depth feature fusion of the Main-Auxiliary network are improved to different degrees.
作者 胡德生 张雪英 张静 李宝芸 HU Desheng;ZHANG Xueying;ZHANG Jing;LI Baoyun(College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China)
出处 《太原理工大学学报》 CAS 北大核心 2021年第5期769-774,共6页 Journal of Taiyuan University of Technology
基金 国家自然科学基金资助项目(61371193) 山西省回国留学人员科研资助项目(HGKY2019025) 山西省研究生教育创新计划项目(2020BY130)。
关键词 语音情感识别 主辅网络 长短时记忆单元 卷积神经网络 speech emotion recognition main-auxiliary network long-short term memory convolutional neural network
  • 相关文献

参考文献1

二级参考文献26

  • 1高维深.基于HMM/ANN混合模型的非特定人语音识别研究[D].电子科技大学2013
  • 2尤鸣宇.语音情感识别的关键技术研究[D].浙江大学2007
  • 3郅菲菲.字词认知N170成分发展的人工语言训练研究[D].浙江师范大学2013
  • 4王魁.汉字视知觉左侧化N170-反映字形加工还是语音编码[D].西南大学2012
  • 5聂聃.基于脑电的情感识别[D].上海交通大学2012
  • 6赵仑,著.ERPs实验教程[M]. 东南大学出版社, 2010
  • 7Nia Cason,Corine Astésano,Daniele Sch?n.Bridging music and speech rhythm: Rhythmic priming and audio-motor training affect speech perception[J]. Acta Psychologica . 2014
  • 8Lauri Nummenmaa,Heini Saarim?ki,Enrico Glerean,Athanasios Gotsopoulos,Iiro P. J??skel?inen,Riitta Hari,Mikko Sams.Emotional speech synchronizes brains across listeners and engages large-scale dynamic brain networks[J]. NeuroImage . 2014
  • 9K. Sreenivasa Rao,Shashidhar G. Koolagudi,Ramu Reddy Vempada.Emotion recognition from speech using global and local prosodic features[J]. International Journal of Speech Technology . 2013 (2)
  • 10Ferenc Honbolygó,Valéria Csépe.Saliency or template? ERP evidence for long-term representation of word stress[J]. International Journal of Psychophysiology . 2012

共引文献17

同被引文献67

引证文献8

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部