期刊文献+

面向语音情感识别的SCBAMM网络

SCBAMM network for speech emotion recognition
下载PDF
导出
摘要 语音情感识别是自动语音识别的重要研究方向,提取最能表征语音情感的特征并构建具有较强鲁棒性和泛化性的声学模型是语音情感识别的重要研究内容。基于此,构建了基于注意机制、跳跃连接、掩蔽操作等关键技术的语音情感识别声学模型,称为具有掩蔽操作的基于注意机制的跳跃卷积双向循环神经网络。该模型有8个隐层,依次是2个全连接层、卷积层、跳跃层、掩蔽层、Bi-LSTM层、注意层和池化层。其中,卷积层提取语音情感空间特征;Bi-LSTM层提取语音情感时间序列特征;跳跃层主要解决梯度问题;掩蔽层使数据中为0的值不参与计算,降低了计算量;注意层根据不同时间序列特征对情感的贡献程度分配权重;池化层计算语音情感序列权重。实验结果表明,所提出的模型在EMO-DB库上取得了92.34%的识别性能。 Speech emotion recognition(SER) is an important research direction of automatic speech recognition. It is an important research content of SER to extract the features that can best represent speech emotion and construct the acoustic model with strong robustness and generalization. In view of this,a new acoustic model SCBAMM(skip-convolution-BiLSTM base on attention mechanism with mask operation)is constructed for SER,which is named as attention mechanism based skip convolution bidirectional cyclic neural network with masking operation. The model has eight hidden layers,which are two dense layers,a convolutional layer,a skip layer,a masking layer,a Bi-LSTM layer,an attention layer and a pooling layer in sequence. Among them,the convolutional layer is used to extract spatial features of the speech emotion,the Bi-LSTM layer is used to extract time series features of the speech emotion,the skip layer is mainly used to solve gradient problem,the masking layer makes the null value in the data not participate in calculation,which reduces the amount of calculation,the attention layer is used to allocate weight to speech emotion according to the contribution of different time series features,and the pooling layer is used to calculate the weight of the whole speech emotion sequence. The experimental results show that the proposed model SCBAMM has achieved a recognition performance of 92.34% on EMO-DB dataset.
作者 张会云 黄鹤鸣 ZHANG Huiyun;HUANG Heming(School of Computer Science,Qinghai Normal University,Xining 810008,China;National Key Laboratory of Tibetan Intelligent Information Processing and Application,Xining 810008,China;Key Laboratory of Tibetan Information Processing,Ministry of Education,Xining 810008,China;Tibetan Information Processing and Machine Translation Key Laboratory of Qinghai Province,Xining 810008,China)
出处 《现代电子技术》 2022年第5期79-83,共5页 Modern Electronics Technique
基金 国家自然科学基金项目(62066039)。
关键词 语音情感识别 特征提取 声学建模 注意机制 跳跃连接 掩蔽操作 SER feature extraction acoustic model attention mechanism skip connection masking operation
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部