期刊文献+

基于ConvNeXt的北京地区红外相机野生动物图像识别改进模型构建

Wildlife Image Recognition of Infrared Cameras in Beijing Area Based on an Improvement ConvNeXt Model
下载PDF
导出
摘要 【目的】针对红外相机拍摄的野生动物图像数据量大、无效图像占比多、图像背景复杂等问题,提出一种可对图像进行自动、高准确率识别的模型,为生物多样性研究和野生动物保护工作提供更高效的支持。【方法】收集整理近4年来北京园林绿化生态系统监测网络各站点红外相机拍摄的约5 TB图像数据,对其手工标注并进行数据增强后自建10类共4234张图像数据集。基于ConvNeXt卷积神经网络,结合北京地区野生动物图像数据集特点,设计BSGG-ConvNeXt模型,使用BlurPool、SENet、全局响应归一化层(GRN)、GCNet提升模型识别能力,并在自建数据集上探究训练策略对ConvNeXt网络识别准确率的影响,通过与其他经典模型比较,明确BSGG-ConvNeXt模型的优势。利用公开的红外野生动物Snapshot Serengeti(SS)数据集和Caltech Camera Traps(CCT)数据集,验证模型的泛化能力。【结果】以ConvNeXt的ConvNeXt-T网络尺寸模型为例,其在自建数据集中的准确率为74.13%,乘加累积操作数(MACs)为4.47×10^(9)。应用不同改进方案发现,使用BlurPool后准确率提升2.2%,MACs降至1.07×10^(9);使用SENet后准确率提升3.2%;使用GRN并删掉缩放层后准确率升至87.18%,参数数量增至27.88×10^(6);使用GCNet后在不增大计算量的情况下准确率升至75.44%,但参数数量增至28.25×10^(6)。将上述改进方案结合得到的BSGGConvNeXt应用于ConvNeXt-T模型获得BSGG-ConvNeXt-T模型,参数数量虽有少量增多,但MACs降为1.07×10^(9),模型准确率升至83.63%,高于原模型。使用预训练权重后的BSGG-ConvNeXt-T模型准确率可达94.07%,高于ResNet-50(76.39%)、ResNeXt-50(87.60%)、MobileViT(90.00%)、DenseNet(87.66%)、RegNet(69.90%)、ConvNeXtv2(91.93%)、SwinTransformer的(86.23%)和MobileOne(71.53%),将BSGG-ConvNeXt模型应用于4种不同网络尺寸的ConvNeXt模型后,在自建数据集中的表现均优于未改进模型。BSGG-ConvNeXt模型在SS数据集中的识别准确率达50.28%,在CCT数据集中的识别准确率达56.15%,均高于原模型的准确率。【结论】BSGG-ConvNeXt模型识别红外相机拍摄的野生动物图像准确率更高,在自建、公开的野生动物红外图像数据集上均有较好表现,且具有一定泛化能力。 【Objective】Aiming at the problems of large amount of data,high proportion of invalid images,and complex image backgrounds in wild animal images captured by infrared cameras,a model that can automatically and accurately recognize images is proposed,providing more efficient support for biodiversity research and wildlife conservation work.【Method】Collect and organize approximately 5 TB of image data captured by infrared cameras at various stations of the Beijing Ecological Observatory Network over the past 4 years.After manual annotation and data augmentation,create a total of 4234 image datasets in 10 categories.Based on ConvNeXt convolutional neural network and combined with the characteristics of wild animal image datasets in Beijing,a BSGG-ConvNeXt model was designed.BlurPool,SENet,global response normalization layer(GRN),and GCNet were used to improve the recognition ability of the model.The impact of training strategies on the recognition accuracy of ConvNeXt network was explored on a self-built dataset.By comparing with other classic models,the advantages of the BSGGConvNeXt model are clarified.Verify the generalization ability of the model using publicly available infrared wildlife snapshot serengeti(SS)dataset andcaltech camera traps(CCT)dataset.【Result】Taking the ConvNeXt size model of the ConvNeXt model as an example,the accuracy in the self-built dataset is 74.13%,and the multiply add cumulative operands(MACs)are 4.47×10^(9).By applying different improvement schemes,it was found that the accuracy increased by 2.2%and MACs decreased to 1.07×10^(9)after using BlurPool.After using SENet,the accuracy improved by 3.2%.After using GRN and removing the scaling layer,the accuracy improved to 87.18%and the number of parameters increased to 27.88×10^(6).After using GCNet,the accuracy was improved to 75.44%without increasing the computational load,but the number of parameters increased to 28.25×10^(6).The BSGG-ConvNeXt obtained by combining the above improvement schemes is applied to the ConvNeXt-T model to obtain the BSGG-ConvNeXt-T model.Although there is a slight increase in the number of parameters,the MACs are reduced to 1.07×10^(9),and the accuracy of the model is improved to 83.63%,which is higher than the original model.After using pre-trained weights,the accuracy of the BSGGConvNeXt-T model can reach 94.07%,which is higher than the accuracy of ResNet-50(76.39%),ResNeXt-50(87.60%),MobileViT(90.00%),DenseNet(87.66%),RegNet(69.90%),ConvNeXtv2(91.93%),SwinTransformer(86.23%),and MobileOne(71.53%)models.After applying the BSGG-ConvNeXt model to four different network sizes of ConvNeXt models,its performance in the self-built dataset is better than that of the unimproved model.The recognition accuracy of the BSGG-ConvNeXt model in the SS dataset can reach 50.28%,and the recognition accuracy in the CCT dataset can reach 56.15%,both of which are higher than the accuracy of the original model.【Conclusion】The BSGG-ConvNeXt model has a higher accuracy in recognizing wild animal images captured by infrared cameras,and performs well on both self built and publicly available wild animal infrared image datasets,with a certain degree of generalization ability.
作者 齐建东 郑尚姿 陈子仪 马鐘添 Qi Jiandong;Zheng Shangzi;Chen Ziyi;Ma Zhongtian(College of Information,Beijing Forestry University,Beijing 100083;Engineering Research Center for Forestry-Oriented Intelligent Information Processing of National Forestry and Grassland Administration,Beijing 100083;School of Artificial Intelligence,Tangshan University,Tangshan 063000)
出处 《林业科学》 EI CAS CSCD 北大核心 2024年第8期33-45,共13页 Scientia Silvae Sinicae
基金 国家重点研发计划项目“典型人工林生态系统对全球变化适应机制”(2020YFA0608100) 国家自然科学基金项目“全球变化背景下人工林生态系统质量和稳定性综合评估”(32071842)。
关键词 野生动物 图像识别 深度学习 卷积神经网络 ConvNeXt wildlife image recognition deep learning convolutional neural network ConvNeXt
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部