摘要
语音情绪识别的一个重要挑战是从语音信号中提取关键特征来提高识别准确率。在现有研究的基础上,提出了一种基于自注意力残差网络(Multi-Head-Attention Residual Network,MHA-ResNet)的语音情绪识别模型,提高了语音情绪识别准确率。首先,将原始语音信号数据进行预处理;其次,将提取到的情绪特征集,利用多头注意力机制具备的并行化处理且自适应关注的特性,初步获取不同状态下鉴别性的语音情绪信息;最后,残差网络进一步获取深层情绪特征,完成不同情绪的识别。为验证模型有效性,在CASIA和EmoDB数据集上进行实验,其结果显示识别准确率分别为93.59%和97.57%。
A significant challenge in the field of speech emotion recognition lies in the extraction of key features from speech signals to enhance recognition accuracy.Drawing on existing research,a model for speech emotion recognition based on Multi-Head-Attention Residual Network(MHA-ResNet)is proposed to elevate the precision of recognizing emotions conveyed through speech.Firstly,the emotional feature set is extracted from the preprocessed speech data.And then,the discriminative speech emotional information in different states is obtained by using the parallel processing characteristics of the multi-head attention mechanism.Finally,deep emotional features are further captured by the residual network,facilitating accurate recognition of diverse emotions.To validate the efficacy of this model,experiments are conducted using CASIA and EmoDB data sets,yielding recognition accuracies of 93.59%and 97.57%,respectively.
作者
周传华
郝敏
曾辉
王勇
ZHOU Chuanhua;HAO Min;ZENG Hui;WANG Yong(School of Management Science&Engineering,Anhui University of Technology,Maanshan 243002,China;School of Computer Science&Technology,University of Science&Technology of China,Hefei 230026,China)
出处
《微电子学与计算机》
2024年第9期41-46,共6页
Microelectronics & Computer
基金
国家自然科学基金(71371013,71772002)。
关键词
语音情绪识别
多头注意力机制
残差网络
情绪特征集
speech emotion recognition
multiple attention mechanism
residual network
emotional feature set