摘要
在语音情感分类算法中,目前大多数基于深度学习的方法存在没有考虑时域和频域的特征进行建模,且网络训练时间长、识别率不高的问题,提出了一种基于神经网络的语谱图情感分类算法。首先选取语谱图作为模型的输入,且为了减少语音情感特征提取过程中浅层特征和训练时上下文细节特征的损失,神经网络模型采用带有残差块的ResNet18网络和嵌入注意力机制的双向长短时记忆(BLSTM)网络的融合模型作为改进,利用ResNet18提取语谱图特征,然后使用注意力机制对其进行特征加权,在BLSTM网络中对加权后的特征进行训练和分类,最终该模型在CASIA数据集上的识别率分别为88.2%,与其他方法相比,所提算法有更好的语音情感分类效果,并且大幅度缩短了整体训练时间。
Aiming at the problem of low voice emotion recognition rate, an emotion classification algorithm based on neural network is proposed. Firstly, in order to reduce the loss of shallow features and contextual details during training of speech emotion feature extraction, this paper proposes a fusion model of ResNet18 network with residual blocks and a bi-directional long-term and short-term memory(BLSTM) network embedded with attention mechanism as an improvement. ResNet18 extracts the features and normalization of the spectrogram, and then uses the attention mechanism to weight the features, and trains and classifies the weighted features in the BLSTM network. In the end, the recognition rates of the model on the CASIA dataset dataset are 88.2%, respectively. The comparison with the existing literature recognition rates verifies the advantages of this algorithm.
作者
金鹭
张寿明
Jin Lu Zhang;Shouming(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
出处
《电子测量技术》
2020年第24期57-63,共7页
Electronic Measurement Technology