摘要
针对混合语音情感识别中,传统识别方法不能充分考虑语种之间的差异性,导致分类准确率偏低的问题,提出了自编码器(autoencoder)与长短时记忆(Long Short Term Memory,LSTM)模型相结合的方法,通过提取MFCC,MEL Spectrogram Frequency,Chroma三种特征获得180维特征。并利用自编码器获取一个更高维度、更深层次的500维特征,通过LSTM进行建模,提高语音情感分类的准确性。使用德语EMO-DB和中文CASIA语音库进行分类实验,研究表明,自编码器提取出的深度特征更适合混合语音情感分类。较传统分类方法,使用自编码器+LSTM进行分类,最优识别结果可提升7.5%。
In mixed speech emotion recognition,traditional recognition methods can not fully consider the differences between languages,which leads to low classification accuracy.A method combining auto encoder with Long Short Term Memory(LSTM)model is proposed.This method obtains 180 dimensional features by extracting MFCC,MEL Spectrum Frequency and Chroma features.In addition,the method uses autoencoder to obtain a higher dimension and deeper level 500-dimension features,as well as to improve the accuracy of speech emotion classification by modeling through the LSTM.The classification experiments were carried out on German EMO-DB and Chinese CASIA database.The result shows that,the depth features extracted from the autoencoder is more suitable for speech emotion classification.Compared with the traditional classification method,the optimal recognition result can be increased by 7.5%by using Autoencoder-LSTM.
作者
张卫
贾宇
张雪英
ZHANG Wei;JIA Yu;ZHANG Xue-Ying(College of Information,Shanxi University of Finance and Economics,Taiyuan Shanxi 030006,China;College of Information and Computer,Taiyuan University of Technology,Taiyuan Shanxi 030024,China)
出处
《计算机仿真》
北大核心
2022年第11期258-262,共5页
Computer Simulation
基金
国家青年科学基金项目(61902226)
山西省青年科技研究基金(201901D211415)
山西省高等学校科技创新项目(2019L0498)
山西财经大学青年科研基金项目(QN-2019017)。