期刊文献+

基于卷积神经网络和Transformer网络的鸟声识别

Bird sound recognition based on convolutional neural network and Transformer network
下载PDF
导出
摘要 针对传统鸟声识别算法中特征提取方式单一、分类识别准确率低等问题,提出一种结合卷积神经网络和Transformer网络的鸟声识别方法。该方法综合考虑网络局部特征学习和全局上下文依赖性构造,从原始鸟声音频信号中提取短时傅里叶变换(Short Time Fourier Transform,STFT)语谱图特征,将其输入到卷积神经网络(Convolutional Neural Network,CNN)中提取局部频谱特征信息,同时提取鸟声信号的对数梅尔特征及一阶差分、二阶差分特征用于合成梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)混合特征向量,将其输入到Transformer网络中获取全局序列特征信息,最后融合所提取的特征可得到更丰富的鸟声特征参数,通过Softmax分类器得到鸟声识别结果。在Birdsdata和xeno-canto鸟声数据集上进行实验,平均识别准确率分别达到了97.81%和89.47%。实验结果表明该方法相较于其他现有的鸟声识别模型具有更高的识别准确率。 In view of the singleness of feature extraction method and low classification accuracy in traditional bird sound recognition algorithms,a bird sound recognition method that combines convolutional neural networks and Transformer networks is proposed in this paper.The method comprehensively considers local feature learning and global context dependency construction of the network,first extracts the features of the short-time Fourier transform(STFT)spectrogram from the original bird sound signal,and then inputs them into the convolution neural network(CNN)to extract local spectrum feature information.At the same time,the log-Mel feature,the first-order and second-order difference features of bird sound signal are extracted to synthesize the mixed Mel frequency cepstrum coefficient(MFCC)feature vector and input into the Transformer network to obtain the global sequence feature information.Finally,the obtained features are fused to obtain richer bird sound feature parameters,and the bird sound recognition results are obtained by Softmax classifier.Experiments on Birdsdata and xeno-canto bird sound datasets show that the average recognition accuracies of this method are 97.81%and 89.47%,respectively,higher than that of other existing bird sound recognition models.
作者 王基豪 周晓彦 李大鹏 韩智超 王丽丽 WANG Jihao;ZHOU Xiaoyan;LI Dapeng;HAN Zhichao;WANG Lili(College of Electronic and Information Engineering,Nanjing University of Information Science and Technology,Nanjing210044,Jiangsu,China)
出处 《声学技术》 CSCD 北大核心 2023年第5期675-683,共9页 Technical Acoustics
关键词 鸟声识别 特征提取 卷积神经网络(CNN) Transformer网络 bird sound recognition feature extraction convolutional neural network(CNN) Transformer network
  • 相关文献

参考文献7

二级参考文献35

共引文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部