摘要
针对中文儿童语音情感识别的准确性问题,提出了一种结合深度卷积神经网络(Deep Convolutional Neural Network,DPCNN)与堆叠长短时记忆(Stacked Long Short Term Memory,SLSTM)网络的融合模型,旨在提高中文儿童语音情感识别的准确性。通过DPCNN对语音信号中的长距离依赖关系进行提取,再利用SLSTM捕捉情感相关的序列依赖信息,最终通过softmax分类器实现情感状态的判别。实验结果显示,基于DPCNN-SLSTM的模型在中文儿童语音数据集上的情感识别准确率达到了92%,显著优于CNN、LSTM和CNN-LSTM模型。研究结果对于推动儿童语音情感识别技术的发展具有重要意义。
To address the accuracy problem of Chinese children’s speech emotion recognition,a fusion model combining DPCNN(Deep Convolutional Neural Network)and SLSTM(Stacked Long Short-Term Memory)network is proposed,which aims to enhance the accuracy of Chinese children’s speech emotion recognition.The long-distance dependencies in the speech signal are extracted by DPCNN,and then SLSTM is used to capture the emotion-related sequence dependency information,and finally the Softmax classifier is used to achieve the emotion state discrimination.The experimental results indicate that the DPCNNSLSTM-based model achieves 92%emotion recognition accuracy on the Chinese children’s speech dataset,which is significantly better than CNN,LSTM and CNN-LSTM models.The results of this study are of great significance in promoting the development of emotion recognition technology for children’s speech.
作者
董胡
彭高丰
陈伟
DONG Hu;PENG Gaofeng;CHEN Wei(College of Information Science and Engineering,Changsha Normal College,Changsha Hunan 410100,China;College of Electronic Information and Electrical Engineering,Changsha College,Changsha Hunan 410022,China)
出处
《通信技术》
2024年第7期666-671,共6页
Communications Technology
基金
湖南省教育科学“十四五”规划课题“基于深度学习的中文儿童语音情感识别及其社会情绪能力评测研究”(XJK23BXX003)。
关键词
深度卷积神经网络
堆叠长短时记忆网络
融合模型
中文儿童语音
情感识别
deep convolutional neural network
stacked long short-term memory network
fusion model
Chinese children’s speech
emotion recognition