摘要
细粒度图像分类任务由于自身存在的细微的类间差别和巨大的类内差别使其极具挑战性,为了更好地学习细粒度图像的潜在特征,该算法将知识蒸馏引入到细粒度图像分类任务中,提出基于知识蒸馏与目标区域选取的细粒度图像分类方法(TRS-DeiT),能使其兼具CNN模型和Transformer模型的各自优点。此外,TRS-DeiT的新型目标区域选取模块能够获取最具区分性的区域;为了区分任务中的易混淆类,引入对抗损失函数计算不同类别图像间的相似度。最终,在三个经典细粒度数据集CUB-200-2011、Stanford Cars和Stanford Dogs上进行训练测试,分别达到90.8%、95.0%、95.1%的准确率。实验结果表明,该算法相较于传统模型具有更高的准确性,通过可视化结果进一步证实该算法的注意力主要集中在识别对象,从而使其更擅长处理细粒度图像分类任务。
Fine-grained visual classification(FGVC)is extremely challenging due to the subtle inter-class differences and the large intra-class differences.In order to learn the embedded features of fine-grained images efficiently,this paper attempted to introduce the idea of knowledge distillation to FGVC,and proposed TRS-DeiT,which was equipped with the common advantages of CNN models and Transformer models simultaneously.Besides,it proposed a novel target regions selection module in TRS-DeiT to obtain the most discriminative regions.It employed a contrastive loss function that measured the similarity of images to distinguish the confusable classes in the task.Finally,it demonstrated the effectiveness of the proposed TRS-DeiT model on CUB-200-2011,Stanford Cars and Stanford Dogs datasets,which achieved the accuracy of 90.8%,95.0%and 95.1%respectively.The experimental results show that the proposed model outperforms the traditional models.Furthermore,the vi-sualization results further illustrate that the attention learned by the proposed model mainly focuses on recognizing objects,thus contributes to fine-grained visual classification tasks.
作者
赵婷婷
高欢
常玉广
陈亚瑞
王嫄
杨巨成
Zhao Tingting;Gao Huan;Chang Yuguang;Chen Yarui;Wang Yuan;Yang Jucheng(College of Artificial Intelligence,Tianjin University of Science&Technology,Tianjin 300457,China)
出处
《计算机应用研究》
CSCD
北大核心
2023年第9期2863-2868,共6页
Application Research of Computers
基金
国家自然科学基金资助项目(61976156)
天津市企业科技特派员项目(20YDTPJC00560)。