期刊文献+

基于自注意力机制时频谱同源特征融合的鸟鸣声分类 被引量:2

Homologous spectrogram feature fusion with self-attention mechanism for bird sound classification
下载PDF
导出
摘要 目前深度学习模型大都难以应对复杂背景噪声下的鸟鸣声分类问题。考虑到鸟鸣声具有时域连续性、频域高低性特点,提出了一种利用同源谱图特征进行融合的模型用于复杂背景噪声下的鸟鸣声分类。首先,使用卷积神经网络(CNN)提取鸟鸣声梅尔时频谱特征;然后,使用特定的卷积以及下采样操作,将同一梅尔时频谱特征的时域和频域维度分别压缩至1,得到仅包含鸟鸣声高低特性的频域特征以及连续特性的时域特征。基于上述提取频域以及时域特征的操作,在时域和频域维度上同时对梅尔时频谱特征进行提取,得到具有连续性以及高低特性的时频域特征。然后,将自注意力机制分别用于得到的时域、频域、时频域特征以加强其各自拥有的特性。最后,将这三类同源谱图特征决策融合后的结果用于鸟鸣声分类。所提模型用于Xeno-canto网站的8种鸟类音频分类,并在分类对比实验中取得了平均精确率(MAP)为0.939的较好结果。实验结果表明该模型能应对复杂背景噪声下的鸟鸣声分类效果较差的问题。 At present,most deep learning models are difficult to deal with the classification of bird sound under complex background noise.Because bird sound has the continuity characteristic in time domain and high-low characteristic in frequency domain,a fusion model of homologous spectrogram features was proposed for bird sound classification under complex background noise.Firstly,Convolutional Neural Network(CNN)was used to extract Mel-spectrogram features of bird sound.Then,the time domain and frequency domain dimensions of the same Mel-spectrogram feature were compressed to 1 by specific convolution and down-sampling operations,so that frequency domain feature with only high-low characteristics and the time domain feature with only continuous characteristics were obtained.Based on the above operation to extract frequency domain and time domain features,the features of Mel-spectrogram were extracted both in time domain and frequency domain,the time-frequency domain features with continuity and high-low characteristics were obtained.Then the self-attention mechanism was applied to the obtained time domain,frequency domain and time-frequency domain features,strengthening their own characteristics.Finally,the results of these three homologous spectrogram features after decision fusion were used for bird sound classification.The proposed model was used for audio classification of 8 bird species on Xeno-canto website,achieved the better result in the comparison experiment with the Mean Average Precision(MAP)of0.939.The experimental results show that the proposed model can deal with the problem of the poor classification effect of bird sound under complex background noise.
作者 刘志华 陈文洁 陈爱斌 LIU Zhihua;CHEN Wenjie;CHEN Aibin(College of Computer and Information Engineering,Central South University of Forestry and Technology,Changsha Hunan 410004,China;Institute of Applied Artificial Intelligence,Central South University of Forestry and Technology,Changsha Hunan 410004,China)
出处 《计算机应用》 CSCD 北大核心 2022年第4期1260-1268,共9页 journal of Computer Applications
基金 智慧物流技术湖南省重点实验室资助项目(2019TP1015)。
关键词 深度学习 鸟鸣声分类 卷积神经网络 自注意力机制 同源谱图特征融合 deep learning bird sound classification Convolutional Neural Network(CNN) self-attention mechanism homologous spectrogram feature fusion
  • 相关文献

参考文献6

二级参考文献43

共引文献47

同被引文献27

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部