期刊文献+

基于动态卷积递归神经网络的语音情感识别 被引量:2

Speech Emotion Recognition Based on Dynamic Convolution Recurrent Neural Network
下载PDF
导出
摘要 动态情感特征是说话人独立语音情感识别中的重要特征。由于缺乏对语音中时频信息的充分挖掘,现有动态情感特征表征能力有限。为更好地提取语音中的动态情感特征,提出一种动态卷积递归神经网络语音情感识别模型。基于动态卷积理论构建一种动态卷积神经网络提取语谱图中的全局动态情感信息,使用注意力机制分别从时间和频率维度对特征图关键情感区域进行强化表示,同时利用双向长短期记忆网络对谱图进行逐帧学习,提取动态帧级特征及情感的时序依赖关系。在此基础上,利用最大密度散度损失对齐新个体特征与训练集特征分布,降低个体差异性对特征分布产生的影响,提升模型表征能力。实验结果表明,该模型在CASIA中文情感语料库、Emo-db德文情感语料库及IEMOCAP英文情感语料库上分别取得59.50%、88.01%及66.90%的加权平均精度,相较HuWSF、CB-SER、RNN-Att等其他主流模型识别精度分别提升1.25~16.00、0.71~2.26及2.16~8.10个百分点,验证了所提模型的有效性。 Dynamic emotion features are important features in speaker independent speech emotion recognition.However,lack of mining on speech time-frequency information limits the representation ability of existing dynamic emotional features.In this study,a dynamic convolution recurrent neural network speech emotion recognition model is proposed to better extract the dynamic emotional features in speech.First,based on the dynamic convolution theory,a dynamic convolution neural network is constructed to extract the global dynamic emotional information in the spectrogram,and the attention mechanism is used to strengthen the representation of the key emotional regions in the feature map in time and frequency dimensions,respectively;simultaneously,the Bi-directional Long Short-Term Memory(BiLSTM)network is used to learn the spectrum frame by frame to extract the dynamic frame level features and the temporal dependence of emotion;finally,the Maximum Density Divergence(MDD)loss is used to align the new individual features with the feature distribution of the training set,and consequently the impact of individual differences on feature distribution is reduced and the representation ability of the model is improved.The experimental results show that the proposed model achieved 59.50%,88.01%,and 66.90%weighted average accuracies on the three databases(CASIA,Emo-db,and IEMOCAP),respectively.Compared with other mainstream models(HuWSF,CB-SER,RNN-Att,et al),the recognition accuracy of the proposed model in the three databases is improved by 1.25-16.00,0.71-2.26,and 2.16-8.10 percentage points,respectively,which verifies the effectiveness of the proposed model.
作者 耿磊 傅洪亮 陶华伟 卢远 郭歆莹 赵力 GENG Lei;FU Hongliang;TAO Huawei;LU Yuan;GUO Xinying;ZHAO Li(Key Laboratory of Food Information Processing and Control,Ministry of Education,Henan University of Technology,Zhengzhou 450001,China;School of Information Science and Engineering,Southeast University,Nanjing 210096,China)
出处 《计算机工程》 CAS CSCD 北大核心 2023年第4期125-130,137,共7页 Computer Engineering
基金 国家自然科学基金(61901159) 河南省高等学校重点科研项目(22A520004,22A510001)。
关键词 语音情感识别 特征提取 动态特征 注意力机制 神经网络 speech emotion recognition feature extraction dynamic feature attention mechanism neural network
  • 相关文献

参考文献3

二级参考文献13

共引文献37

同被引文献15

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部