期刊文献+

基于多维神经网络深度特征融合的鸟鸣识别算法 被引量:1

Deep Feature Fusion of Multi-Dimensional Neural Network for Bird Call Recognition
下载PDF
导出
摘要 为了进一步提高夜间迁徙鸟鸣监测的准确率,提出一种基于多维神经网络深度特征融合的鸟鸣识别算法。首先,提取鸟鸣对数尺度的梅尔谱图作为VGG Style模型的训练特征,增强时频谱图的能量分布,通过Mix up数据混合生成虚拟数据以减少模型的过拟合。之后,将预训练的VGG Style作为特征提取器对每一段鸟鸣提取深度特征。鉴于不同维度模型的互补性,该文提出分别使用1维CNN-LSTM、2维VGG Style与3维DenseNet121模型作为特征提取器生成高级特征。对于1维CNN-LSTM,使用小波分解作为池化方法,分别对鸟鸣时、频域进行9层小波分解,生成多层LBP特征以获取更丰富的时频信息。最后,对CNN-LSTM与DenseNet121的全连接层进行优化,减少模型参数,提高实时性。实验结果表明,通过融合多维神经网络的深度特征,使用浅层分类器在含有43种鸟类的CLO-43SD数据集中,获得了93.89%的平衡准确率,相较于最新的Mel-VGG与Subnet-CNN融合模型,平衡准确率提高了7.58%。 In order to improve the accuracy of bird sound monitoring during night migration,this paper proposed a deep feature fusion system of multi-dimensional neural network for bird sound classification.Firstly,we proposed an improved VGG Style model,which used log-scaled Mel spectrogram as training feature to enhance the energy distribution of spectrogram,and generate virtual data by mix up to reduce model over-fitting.Then,the pre-trained VGG Style was used to generate deep features for each bird sound.In view of the complementarity of different dimensional models,1D CNN-LSTM,2D VGG Style and 3D Dense Net121 were employed as feature extractors to generate advanced features.For 1D CNN-LSTM,in order to obtain richer time-frequency information,the wavelet decomposition was used as pooling method to extract multi-level LBP features from time domain and frequency domain respectively as training input.Meanwhile,the fully connected layer of CNN-LSTM and Dense Net121 were optimized to reduce model parameters and improve real-time performance.Finally,the deep features of three models were fused and fed to K-nearest neighbor for classification,which got the balanced-accuracy of 93.89%for a public dataset CLO-43SD of 5428 flight calls spanning 43 species and exceeded the latest fusion of Mel-VGG and Subnet-CNN by 7.58%.
作者 吉训生 江昆 谢捷 JI Xunsheng;JIANG Kun;XIE Jie(School of Internet of Things Engineering,Jiangnan University,Wuxi,Jiangsu 214122,China;Key Laboratory of Advanced Process Control for Light Industry(Ministry of Education),Jiangnan University,Wuxi,Jiangsu 214122,China;Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology,Jiangnan University,Wuxi,Jiangsu 214122,China)
出处 《信号处理》 CSCD 北大核心 2022年第4期844-853,共10页 Journal of Signal Processing
基金 国家自然科学基金(61902154) 中央大学基础研究基金(JUSRP11924) 江苏省自然科学基金(BK2019043526) 江苏省重点研发项目-现代农业(BE2018334)。
关键词 鸟鸣识别 1维CNN-LSTM 2维VGG Style 3维DenseNet121 深度特征融合 bird sound classification 1D CNN-LSTM 2D VGG Style 3D DenseNet121 deep feature fusion
  • 相关文献

参考文献3

二级参考文献16

共引文献5

同被引文献1

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部