摘要
针对现有语音情感数据集中样本数不足以支撑训练深度神经网络以及层数不断加深带来的梯度爆炸问题,在使用高斯白噪声和随机时频掩蔽对数据集进行增强的基础上,提出了一种融合通道、空间注意力和辅助分类器的膨胀残差网络(dilated residual network with auxiliary calssifier and channel,spatial attention,DRN-A-CASA)语音情感识别方法。首先,使用增强后的梅尔谱图数据集作为网络模型的输入,并在残差网络原卷积层中采用膨胀卷积来扩大特征提取感受野;其次,在残差网络layer3层后添加辅助分类器分支,加速网络训练并改进损失函数;最后,在layer4层中添加注意力机制关注情感特征,实现语音情感的分类。实验结果表明,基于DRN-A-CASA的模型在RAVDESS及EMODB两个数据集上分别达到了92.91%和89.15%的识别准确率,验证了所提方法的有效性和泛化性能。
Considering the insufficient sample size in existing speech emotion datasets to support the training of deep neural networks and the gradient explosion problem caused by the increasing depth of the network,this paper proposes a speech emotion recognition method based on a dilated residual network with auxiliary classifier and channel,spatial attention(DRN-A-CASA)that integrates channel and spatial attention fusion and auxiliary classifier.Gaussian white noise and random time-frequency masking are employed to enhance the dataset.First,the enhanced Mel-spectrogram dataset is used as the input of the network model,and dilated convolution is applied to expand the feature extraction receptive field in the residual network's original convolution layers.Second,an auxiliary classifier branch is added after the layer 3 of the residual network to accelerate network training and improve the loss function.Finally,the attention mechanism is introduced in layer 4 to focus on emotional features for speech emotion classification.The experimental results show that the DRN-A-CASA model achieves recognition accuracies of 92.91%and 89.15%on the RAVDESS and EMODB datasets respectively,demonstrating the effectiveness and generalization performance of the proposed method.
作者
周佳鑫
焦亚萌
王彦斌
郑燕茹
Zhou Jiaxin;Jiao Yameng;Wang Yanbin;Zheng Yanru(School of Electronices and Information,Xi'an Polytechnic University,Xi'an 710048,China)
出处
《国外电子测量技术》
北大核心
2023年第8期19-25,共7页
Foreign Electronic Measurement Technology
基金
国家自然科学基金(61901347)
中国学位与研究生教育学会重点课题(2020ZDB67)
西安市科技局科研计划项目(22GXFW0034)资助。
关键词
语音情感识别
残差网络
注意力
数据增强
speech emotion recognition
residual network
attention
data augmentation