摘要
通过对图像中感兴趣的对象进行分类与定位,能够帮助人们理解唐卡图像丰富的语义信息,促进文化传承。针对唐卡图像样本较少,背景复杂,检测目标存在遮挡,检测精度不高等问题,本文提出了一种结合多尺度上下文信息和双注意力引导的唐卡小样本目标检测算法。首先,构建了一个新的多尺度特征金字塔,学习唐卡图像的多层级特征和上下文信息,提高模型对多尺度目标的判别能力。其次,在特征金字塔末端加入双注意力引导模块,提升模型对关键特征的表征能力,同时降低噪声的影响。最后利用Rank&Sort Loss替换交叉熵分类损失,简化模型训练的复杂度并提升检测精度。实验结果表明,所提出的方法在唐卡数据集和COCO数据集上的10-shot实验中,平均检测精度分别达到了19.7%和11.2%。
Classifying and locating objects of interest in Thangka images can help people understand the rich semantic information of Thangka and promote cultural inheritance.To address the problems of insufficient Thangka image samples,the complex background,the occlusion of detection targets,and the low detection accuracy,this paper proposes a few-shot object detection algorithm for Thangka images that combines multi-scale context information and dual attention guidance.First,a new multi-scale feature pyramid is constructed to learn the multi-level features and contextual information of Thangka images and improve the ability of the model to discriminate multi-scale targets.Second,a dual attention guidance module is added at the end of the feature pyramid to improve the ability of the model to represent key features while reducing the impact of noise.Finally,Rank&Sort Loss is used to replace the cross-entropy classification loss,which simplifies the model training process and increases the detection accuracy.Experimental results indicate that the proposed method achieved a mean average precision of 19.7%and 11.2%in 10-shot experiments using a Thangka dataset and the COCO dataset,respectively.
作者
胡文瑾
唐慧媛
乐超洋
宋华飞
HU Wenjin;TANG Huiyuan;YUE Chaoyang;SONG Huafei(Key Laboratory of China's Ethnic Languages and Information Technology Ministry of Education,Northwest Minzu University,Lanzhou 730030,China;School of Mathematics and Computer Science,Northwest Minzu University,Lanzhou 730030,China)
出处
《光学精密工程》
EI
CAS
CSCD
北大核心
2023年第12期1859-1869,共11页
Optics and Precision Engineering
基金
国家自然科学基金(No.62061042,No.61862057)
国家民委创新团队计划资助(No.2018[98]号)
西北民族大学“双一流”和特色发展引导专项资金资助项目。
关键词
唐卡
小样本目标检测
上下文信息
多尺度特征
双注意力机制
thangka
few-shot object detection
contextual information
multi-scale feature
dual attention mechanism