期刊文献+

基于改进语音处理的卷积神经网络中文语音情感识别方法 被引量:15

Method for Chinese Speech Emotion Recognition Based on Improved SpeechProcessing Convolutional Neural Network
下载PDF
导出
摘要 语音情感识别在人机交互中具有重要意义。为解决中文语音情感识别效率和准确率低等问题,提出一种基于Trumpet-6卷积神经网络模型的中文语音情感识别方法。在MFCC特征提取过程中,通过增加分帧加窗操作时采样点的个数,增添每个汉明窗内的特征及减少汉明窗个数,从而缩小MFCC特征图的像素尺寸,提高单次识别的处理效率。在此基础上,使用高斯白噪声对数据集进行数据增强处理,缓解训练过程中的过拟合问题。在CASIA语音情感数据集上的实验结果表明,该方法的测试准确率达95.7%,优于Lenet-5、RNN、LSTM等传统方法,且Trumpet-6卷积神经网络模型采用2048个采样点,仅需176550个待训练参数,与采用DCNN的ResNet34和循环神经网络模型相比,参数更少,结构更简单,处理速度更快。 Speech emotion recognition is essential in human-computer interaction.In this study,a Chinese speech emotion recognition method based on the Trumpt-6 convolutional neural network model was developed to solve the problem of low efficiency and accuracy of Chinese speech emotion recognition.During the process of extracting the Mel Frequency Cepstral Coefficient(MFCC)feature,the pixel size of the MFCC feature map was reduced to improve the processing efficiency of single recognition.This was achieved by increasing the number of sampling points in the frame windowing operation,adding the features in each Hamming window,and reducing the number of Hamming windows.Gaussian white noise was used to enhance the data set to minimize overfitting during the training process.The experimental results for the CASIA speech emotion data set show that the test accuracy of this method is 95.7%,which is better than those of traditional methods,such as Lenet-5,Recurrent Neural Network(RNN),and Long Short-Term Memory(LSTM).The Trump-6 convolutional neural network model uses 2048 sampling points and only 176550 parameters for training.This method has fewer parameters,a simpler structure,and faster processing than ResNet34 and the cyclic neural network model using deep convolutional neural networks.
作者 乔栋 陈章进 邓良 屠程力 QIAO Dong;CHEN Zhangjin;DENG Liang;TU Chengli(Microelectronics Research and Development Center,Shanghai University,Shanghai 200444,China;Computing Centre,Shanghai University,Shanghai 200444,China)
出处 《计算机工程》 CAS CSCD 北大核心 2022年第2期281-290,共10页 Computer Engineering
基金 国家自然科学基金(61674100)。
关键词 语音情感识别 MFCC特征 高斯白噪声 数据增强 卷积神经网络 speech emotion recognition MFCC feature white Gaussian noise data set enhancement Convolution Neural Network(CNN)
  • 相关文献

参考文献11

二级参考文献40

共引文献157

同被引文献94

引证文献15

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部