摘要
语音情感识别是人机交互的重要方向,可广泛应用于人机交互和呼叫中心等领域,有很大应用价值。近年来,深度神经网络在识别情感方面取得了巨大成功,但现有方法对高层语音特征提取会丢失大量原始信息并且识别准确率不高。提出了一种新的语音情感识别方法,由卷积神经网络从原始信号中提取特征;并在其堆叠一个2层长短时记忆神经网络,最终识别准确率达到91. 74%,显著优于基于柏林语音情感数据库(EMO-DB)等其他方法。
Speech emotion recognition is an important direction of human-computer interaction.It can be widely used in human-computer interaction and call center fields,and has great application value.In recent years,deep neural networks have achieved great success in recognizing emotions.However,the existing methods for high-level speech feature extraction will lose a lot of original information and the recognition accuracy is not high.A new speech emotion recognition method was proposed.The convolutional neural network extracts features from the original signal and stacks a 2-layer long-term memory neural network.The final recognition accuracy is 91.74%.This method is significantly better than other methods based on Berlin database of emotional speech(EMO-DB).
作者
杨明极
张家彬
YANG Ming-ji;ZHANG Jia-bin(School of Measure-control Technology and Communications Engineering,Harbin University of Science and Technology,Harbin 150080,China)
出处
《科学技术与工程》
北大核心
2019年第8期127-131,共5页
Science Technology and Engineering
关键词
语音情感识别
深度学习
卷积神经网络
长短时记忆神经网络
speech emotion recognition
deep learning
convolutional neural network
long short term memory network