摘要
声音是动物向外界表达情绪的一种重要方式,通过提取动物声音特征,建立特征值与动物情绪之间的映射关系,可以实现对动物情绪的感知和理解。为提高动物情绪识别性能,本文提出基于Bahdanau注意力机制的双向长短期记忆网络的动物声音情绪识别方法。该方法对动物声音进行特征提取,提取了的频谱质心、频谱带宽、频谱滚降点、过零率、均方根能量、频谱对比度、梅尔倒谱系数以及其一阶差分作为特征向量,输入双向长短期记忆网络,通过注意力机制对情绪特征进行通道方向的权重学习,最后由全连接层进行情感类型判别。本文以狗为例,对狗的声音进行了情绪识别实验,实验结果表明:相比于循环神经网络、双向长短期记忆网络,本文方法的识别准确度更高。
Sound serves as a crucial mean for animals to express their emotions to the outside world.By establishing the mapping relationship between animal′s emotions and features extracted from animal sound,it becomes possible for computers to perceive and understand the emotional states of animals.In order to improve the performance of animal emotion recognition,a method for recognizing emotions in animal sounds based on the Bi-directional Long Short-Term Memory(BiLSTM)network with Bahdanau attention mechanism is presented in this paper.Feature extraction of the proposed method such as the spectral centroid,spectral bandwidth,spectral rolloff point,zero-crossing rate,root mean square energy,spectral contrast,Mel-frequency cepstral coefficients and their first-order differences,forming a feature vector from animal sound.The feature vector is treated as the input of BiLSTM network.Through the attention mechanism,the proposed method learns channel-wise weights for emotional features.Ultimately,a fully connected layer is utilized for the classification of emotional categories.Taking dogs as an example,experiments are conducted to recognize emotions in dog sounds.The experimental results demonstrate that the proposed method outperforms the methods based on Recurrent Neural Networks and BiLSTM networks with higher accuracy in emotion recognition.
作者
胡文星
蔡佳欣
柯振宇
彭烁钟
胡松
赵小燕
HU Wenxing;CAI Jiaxin;KE Zhenyu;PENG Shuozhong;HU Song;ZHAO Xiaoyan(School of Information and Communication Engineering,Nanjing Institute of Technology,Nanjing 211167,China)
出处
《智能计算机与应用》
2024年第7期57-63,共7页
Intelligent Computer and Applications
基金
江苏省大学生实践创新训练计划项目(202311276087Y)
南京工程学院引进人才科研启动基金项目(YKJ202019)。