摘要
为了进一步提高夜间迁徙鸟鸣监测的准确率,提出一种基于多维神经网络深度特征融合的鸟鸣识别算法。首先,提取鸟鸣对数尺度的梅尔谱图作为VGG Style模型的训练特征,增强时频谱图的能量分布,通过Mix up数据混合生成虚拟数据以减少模型的过拟合。之后,将预训练的VGG Style作为特征提取器对每一段鸟鸣提取深度特征。鉴于不同维度模型的互补性,该文提出分别使用1维CNN-LSTM、2维VGG Style与3维DenseNet121模型作为特征提取器生成高级特征。对于1维CNN-LSTM,使用小波分解作为池化方法,分别对鸟鸣时、频域进行9层小波分解,生成多层LBP特征以获取更丰富的时频信息。最后,对CNN-LSTM与DenseNet121的全连接层进行优化,减少模型参数,提高实时性。实验结果表明,通过融合多维神经网络的深度特征,使用浅层分类器在含有43种鸟类的CLO-43SD数据集中,获得了93.89%的平衡准确率,相较于最新的Mel-VGG与Subnet-CNN融合模型,平衡准确率提高了7.58%。
In order to improve the accuracy of bird sound monitoring during night migration,this paper proposed a deep feature fusion system of multi-dimensional neural network for bird sound classification.Firstly,we proposed an improved VGG Style model,which used log-scaled Mel spectrogram as training feature to enhance the energy distribution of spectrogram,and generate virtual data by mix up to reduce model over-fitting.Then,the pre-trained VGG Style was used to generate deep features for each bird sound.In view of the complementarity of different dimensional models,1D CNN-LSTM,2D VGG Style and 3D Dense Net121 were employed as feature extractors to generate advanced features.For 1D CNN-LSTM,in order to obtain richer time-frequency information,the wavelet decomposition was used as pooling method to extract multi-level LBP features from time domain and frequency domain respectively as training input.Meanwhile,the fully connected layer of CNN-LSTM and Dense Net121 were optimized to reduce model parameters and improve real-time performance.Finally,the deep features of three models were fused and fed to K-nearest neighbor for classification,which got the balanced-accuracy of 93.89%for a public dataset CLO-43SD of 5428 flight calls spanning 43 species and exceeded the latest fusion of Mel-VGG and Subnet-CNN by 7.58%.
作者
吉训生
江昆
谢捷
JI Xunsheng;JIANG Kun;XIE Jie(School of Internet of Things Engineering,Jiangnan University,Wuxi,Jiangsu 214122,China;Key Laboratory of Advanced Process Control for Light Industry(Ministry of Education),Jiangnan University,Wuxi,Jiangsu 214122,China;Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology,Jiangnan University,Wuxi,Jiangsu 214122,China)
出处
《信号处理》
CSCD
北大核心
2022年第4期844-853,共10页
Journal of Signal Processing
基金
国家自然科学基金(61902154)
中央大学基础研究基金(JUSRP11924)
江苏省自然科学基金(BK2019043526)
江苏省重点研发项目-现代农业(BE2018334)。