摘要
为了解决语音情感识别中时空特征动态依赖问题,提出一种基于注意力机制的非线性时空特征融合模型。模型利用基于注意力机制的长短时记忆网络提取语音信号中的时间特征,利用时间卷积网络提取语音信号中的空间特征,利用注意力机制将时空特征进行非线性的融合,并将非线性融合后的高级特征输入给全连接层进行语音情感识别。实验在IEMOCAP数据集中进行评估,实验结果表明,该方法可以同时考虑时空特征的内在关联,相对于使用线性融合的方法,利用注意力机制进行非线性特征融合的网络可以有效地提高语音情感识别准确率。
In order to solve the problem of dynamic dependence of spatiotemporal features in speech emotion recognition, a nonlinear spatiotemporal feature fusion model based on attention mechanism is proposed. The model used the long short-term memory network based on the attention mechanism to extract the time features in the speech signal, used the time convolution network to extract the spatial features in the speech signal, and used the attention mechanism to nonlinearly merge the spatial-temporal features. The advanced features after fusion were input to the fully connected layer for speech emotion recognition. The experiment was evaluated on the IEMOCAP data set. The experimental results show that the method can simultaneously consider the internal correlation of the spatial-temporal features. Compared with the linear fusion method, the network that uses the attention mechanism for nonlinear feature fusion can effectively improve the accuracy of speech emotion recognition.
作者
周伟东
周后盘
夏鹏飞
Zhou Weidong;Zhou Houpan;Xia Pengfei(College of Automation(Artificial Intelligence),Hangzhou Dianzi University,Hangzhou 310000,Zhejiang,China)
出处
《计算机应用与软件》
北大核心
2023年第1期216-221,272,共7页
Computer Applications and Software
关键词
语音情感识别
长短时记忆网络
时间卷积网络
非线性融合
Speech emotion recognition
Long short-term memory network
Time convolutional network
Nonlinear fusion