摘要
鸟类图像不同子类别外观相似,而同类别目标因复杂的背景、姿态等呈现较大的类内差异。针对这个问题,提出了基于多尺度注意力的卷积神经网络模型。模型通过无参数学习的目标模块和部件模块使注意力由全局图像逐渐聚焦到目标和部件图像,形成了能输入多尺度图像的三分支网络模型。此外,引入排序损失以减少背景的干扰。在CUB-200-2011和NABirds数据集上,模型的识别精度分别为87.21%和85.96%,与基线模型相比,识别精度得到有效提高,验证了模型的有效性。
Different sub-categories of bird images have similar appearances,while objects of the same category show large in-tra-class variances due to complex backgrounds and pose.To solve this problem,a convolutional neural network model based on multi-scale attention is proposed.The model gradually focuses on the attention from the global image to the target and component im-ages through the target module and component module of parameter-free learning and forms a three-branch network model that can input multi-scale images.Furthermore,an ordering loss is introduced to reduce background interference.On the CUB-200-2011 and NABirds datasets,the recognition accuracy of the model is 87.21%and 85.96%,respectively.Compared with the baseline mod-el,the recognition accuracy is effectively improved,which verifies the effectiveness of the model.
作者
阮涛
郝智程
RUAN Tao;HAO Zhicheng(Institute of Applied Mathematics,Beijing Information Science&Technology University,Beijing 100010)
出处
《计算机与数字工程》
2024年第10期3148-3152,3171,共6页
Computer & Digital Engineering
关键词
鸟类图像识别
多尺度注意力
排序损失
卷积神经网络
bird image recognition
multiscale attention
rank loss
convolutional neural networks