摘要
机场中飞翔的鸟类对于飞机安全飞行的危害极大,为避免飞鸟撞击飞机事故的发生,机场需识别出飞鸟的种类并自动做出对应的驱鸟措施,如声波驱鸟、光驱鸟等等,相比于直接采用混合驱鸟方式,上述方法节省了大量人力和物力.但机场飞鸟种类识别是一个十分棘手的问题.首先,作为细粒度分类问题,飞鸟类间相似度较高且类内对图像的变化十分敏感;其次,每种飞鸟图像较少,易造成过拟合问题;最后,机场采集的飞鸟图像呈现出剪影化、重影化、遮挡的形态,相比于正常拍摄的图像,丢失了许多特征细节.为了解决上述问题,根据实际情况提出了涵盖危险鸟类的数据集,并提出了以二值化算法为核心的预处理方法,同时在细粒度图像分类算法层面,提供了两种思路:第一,为解决类间相似度高的问题,本文使用堆叠的Swin Transformer作为骨干网络提取细腻的特征表示,并结合中心损失函数与有监督的Softmax损失函数,得到了比常规架构和损失更佳的结果;第二,考虑到图像质量低、样本少的问题,使用集成学习的方法,用不同的网络架构提取特征表示,达到充分利用图像信息的目的.上述两种方法的实验结果表明,前者在Nabirds数据集上识别率达到90%以上,而在整合的数据集上识别率达到64%;集成学习则有效地提取了低质量图像的特征,达到了理想的效果.
Birds flying in airports pose a great threat to safe aircraft operation.To avoid accidents caused by bird strikes,airports need to identify bird species and automatically take corresponding bird control measures,such as sound wave or light control,which saves a lot of manpower and resources compared with using a mixed bird control approach.However,bird species identification at airports is a very challenging problem.Firstly,as a fine-grained classification problem,birds have high intra-class variance and low inter-class variance.Secondly,each type of bird species has few images,which may cause overfitting.Finally,the bird images collected at airports are often in the form of silhouettes,heavy shadows,or obscured images,which lose many feature details compared to normal images.To address these issues,this research proposed a data set covering dangerous birds based on the actual situation and a preprocessing method with a binarization algorithm as the core as well.At the same time,in terms of fine-grained image classification algorithms,two approaches were proposed.First,to address the problem of high similarity between bird species,a stacked Swin Transformer was used as the backbone network to extract delicate feature representations,and the center loss function was combined with the supervised softmax loss function to achieve better results than conventional architectures and loss functions.Second,considering the low image quality and small sample size,ensemble learning methods with different network architectures were used to extract feature representations,which fully utilized the image information.The experimental results of these two methods show that the former achieved a recognition rate of over 90%on the Nabirds dataset,and on the integrated dataset,a recognition rate of 64%,while the latter effectively extracted the features of low-quality images,achieving ideal results.
作者
曹辰鹏
易浩
张栗粽
母翀
CAO Chenpeng;YI Hao;ZHANG Lizong;MU Chong(University of Electronic Science and Technology of China,Chengdu 611731,China)
基金
国家自然科学基金项目(62271125,62273071)
四川省科技计划(2022YFG0038,2021YFG0018)
中央高校基本科研业务费资助项目(ZYGX2020ZB034,ZYGX2021J019)
关键词
人工智能
深度学习
图像细粒度分类
集成学习
artificial intelligence
deep learning
fine-grained visual categorization
ensemble learning