摘要
在目前语音情感识别中,采用单一梅尔倒谱系数(MFCC)频谱的方法不能完全体现语音中所包含的情感特性,而多特征融合容易导致维数过大。提出了一种融合MFCC及其差分频谱的双向长短时记忆网络结合卷积神经网络(Bi-LSTM-CNN)的语音情感识别算法。首先提取语音信号的MFCC特征,并进行差分运算得到一阶、二阶差分特征提取频谱,再采用主成分分析法分别获取3个频谱中贡献度较高的维度组成新的频谱,达到降维目的,并将降维后的3个特征频谱从上到下依次进行堆叠,得到动、静结合的MFCC差分融合频谱。训练阶段Bi-LSTM-CNN模型从特征融合频谱中学习语音情感特性,并采用稀疏交叉熵法得到最优结果。实验结果表明在RAVDESS数据集上准确率为81.32%,在EMO-DB数据集上对情感识别的准确率为85.51%,比主流情感识别模型的准确率提高了4.85%。
In the current speech emotion recognition,the single MFCC spectrum method can not fully reflect the emotional characteristics contained in speech,and the multi-feature fusion is easy to lead to too large dimension.To solve the above problems,this paper proposes a speech emotion recognition algorithm which integrates MFCC and its differential spectrum bidirectional short and long time memory network and convolutional neural network(Bi-LSTM-CNN).Firstly,the MFCC features of speech signals are extracted,and the difference operation is carried out to obtain the feature spectrum A and B,and then the PCA method is used to obtain the dimensions of the three spectra with higher contribution to form a new spectrum to achieve the purpose of dimension reduction.The three feature spectra after dimension reduction are stacked horizontally from top to bottom to obtain the MFCC differential fusion spectrum combined with dynamic and static.In the training stage,the Bi-LSTM-CNN model learns the speech emotion characteristics from the feature fusion spectrum,and uses sparse cross entropy method to get the optimal result.The experimental results show that the accuracy of RAVDESS data set is 81.32%,and the accuracy of emotion recognition on EMO-DB data set is 85.51%,which is 4.85% higher than the mainstream emotion recognition model.
作者
黄喜阳
杜庆治
龙华
邵玉斌
HUANG Xiyang;DU Qingzhi;LONG Hua;SHAO Yubin(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504,China)
出处
《陕西理工大学学报(自然科学版)》
2023年第4期17-25,共9页
Journal of Shaanxi University of Technology:Natural Science Edition
基金
云南省媒体融合重点实验室开放项目(320225403)。
关键词
语音情感识别
主成分分析法
双向长短时记忆网络
MFCC差分融合频谱
深度学习
speech emotion recognition
principal component analysis
bidirectional long short-term memory
MFCC differential fusion frequency spectrum
deep learning