期刊文献+

基于MFCC特征融合的语音情感识别算法 被引量:2

Speech emotion recognition algorithm based on MFCC feature fusion
下载PDF
导出
摘要 在目前语音情感识别中,采用单一梅尔倒谱系数(MFCC)频谱的方法不能完全体现语音中所包含的情感特性,而多特征融合容易导致维数过大。提出了一种融合MFCC及其差分频谱的双向长短时记忆网络结合卷积神经网络(Bi-LSTM-CNN)的语音情感识别算法。首先提取语音信号的MFCC特征,并进行差分运算得到一阶、二阶差分特征提取频谱,再采用主成分分析法分别获取3个频谱中贡献度较高的维度组成新的频谱,达到降维目的,并将降维后的3个特征频谱从上到下依次进行堆叠,得到动、静结合的MFCC差分融合频谱。训练阶段Bi-LSTM-CNN模型从特征融合频谱中学习语音情感特性,并采用稀疏交叉熵法得到最优结果。实验结果表明在RAVDESS数据集上准确率为81.32%,在EMO-DB数据集上对情感识别的准确率为85.51%,比主流情感识别模型的准确率提高了4.85%。 In the current speech emotion recognition,the single MFCC spectrum method can not fully reflect the emotional characteristics contained in speech,and the multi-feature fusion is easy to lead to too large dimension.To solve the above problems,this paper proposes a speech emotion recognition algorithm which integrates MFCC and its differential spectrum bidirectional short and long time memory network and convolutional neural network(Bi-LSTM-CNN).Firstly,the MFCC features of speech signals are extracted,and the difference operation is carried out to obtain the feature spectrum A and B,and then the PCA method is used to obtain the dimensions of the three spectra with higher contribution to form a new spectrum to achieve the purpose of dimension reduction.The three feature spectra after dimension reduction are stacked horizontally from top to bottom to obtain the MFCC differential fusion spectrum combined with dynamic and static.In the training stage,the Bi-LSTM-CNN model learns the speech emotion characteristics from the feature fusion spectrum,and uses sparse cross entropy method to get the optimal result.The experimental results show that the accuracy of RAVDESS data set is 81.32%,and the accuracy of emotion recognition on EMO-DB data set is 85.51%,which is 4.85% higher than the mainstream emotion recognition model.
作者 黄喜阳 杜庆治 龙华 邵玉斌 HUANG Xiyang;DU Qingzhi;LONG Hua;SHAO Yubin(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504,China)
出处 《陕西理工大学学报(自然科学版)》 2023年第4期17-25,共9页 Journal of Shaanxi University of Technology:Natural Science Edition
基金 云南省媒体融合重点实验室开放项目(320225403)。
关键词 语音情感识别 主成分分析法 双向长短时记忆网络 MFCC差分融合频谱 深度学习 speech emotion recognition principal component analysis bidirectional long short-term memory MFCC differential fusion frequency spectrum deep learning
  • 相关文献

参考文献8

二级参考文献139

  • 1纪正飚,王吉林,赵力.基于模糊K近邻的语音情感识别[J].微电子学与计算机,2015,32(3):59-62. 被引量:10
  • 2王兴玲,李占斌.基于网格搜索的支持向量机核函数参数的确定[J].中国海洋大学学报(自然科学版),2005,35(5):859-862. 被引量:127
  • 3周群一,吕旭东,段会龙.ECG心搏模式识别[J].生物医学工程学杂志,2005,22(1):202-206. 被引量:10
  • 4van Bezooijen R,Otto SA,Heenan TA. Recognition of vocal expressions of emotion:A three-nation study to identify universal characteristics[J].{H}JOURNAL OF CROSS-CULTURAL PSYCHOLOGY,1983,(04):387-406.
  • 5Tolkmitt FJ,Scherer KR. Effect of experimentally induced stress on vocal parameters[J].Journal of Experimental Psychology Human Perception Performance,1986,(03):302-313.
  • 6Cahn JE. The generation of affect in synthesized speech[J].Journal of the American Voice Input/Output Society,1990.1-19.
  • 7Moriyama T,Ozawa S. Emotion recognition and synthesis system on speech[A].Florence:IEEE Computer Society,1999.840-844.
  • 8Cowie R,Douglas-Cowie E,Savvidou S,McMahon E,Sawey M,Schro. Feeltrace:An instrument for recording perceived emotion in real time[A].Belfast:ISCA,2000.19-24.
  • 9Grimm M,Kroschel K. Evaluation of natural emotions using self assessment manikins[A].Cancun,2005.381-385.
  • 10Grimm M,Kroschel K,Narayanan S. Support vector regression for automatic recognition of spontaneous emotions in speech[A].IEEE Computer Society,2007.1085-1088.

共引文献229

同被引文献14

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部