期刊文献+

基于提示学习和注意力机制的多标签图像分类方法

Multi-label Image Classification Method Based on Prompt Learning and Attention Mechanism
下载PDF
导出
摘要 针对传统多标签分类模型中标签相关性容易被忽略和标签标注成本不断增加的问题,提出一种新颖的多标签图像分类方法,将提示学习和交叉注意力机制结合并使用部分标签训练。具体来说,首先,通过将提示与标签结合生成文本输入,并使用预训练文本编码器进行编码,提取文本特征。其次,将图像作为图像编码器的输入。同时,在文本和图像编码器中,加入可学习的提示,旨在增强模型性能。此外,采用了交叉注意力机制,促进模态间的信息交互,从而提升分类效果。通过实验表明,该模型在The PASCAL Visual Object Classes(VOC2007)数据集上使用90%的真实标签时,mAP值达到94.6%。 Aiming at the problem that label relevance is easily ignored and the cost of label annotation is increasing in traditional multi-label classification models,this paper proposes a novel multi-label image classification method that combines prompt learning and cross-attention mechanism using partially labeled training data.Specifically,the method first generates textual inputs by combining prompts with labels and encodes them using a pre-trained text encoder to extract text features.Next,images are used as inputs to an image encoder.Meanwhile,learnable prompts are incorporated in both the text and image encoders to enhance model performance.Additionally,a cross-attention mechanism is employed to facilitate interaction between modalities and improve classification effect.Experimental results show that the model achieves a mean Average Precision(mAP)value of 94.6%on The PASCAL Visual Object Classes(VOC2007)dataset when using 90%of the true labels.
作者 汪瑞 武芳宇 张百灵 WANG Rui;WU Fangyu;ZHANG Bailing(School of Computer Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China;School of Advanced Technology,Xi’an Jiaotong-Liverpool University,Suzhou 215123,China;School of Computing and Data Engineering,NingboTech University,Ningbo 315100,China)
出处 《软件工程》 2024年第7期42-46,共5页 Software Engineering
基金 宁波市科技计划项目(2022Z082,2023Z069) 浙江省自然科学基金(LY23F020014) 苏州市科技计划项目(ZXL2023176)。
关键词 多标签分类 提示学习 注意力机制 multi-label classification prompt learning attention mechanism
  • 相关文献

参考文献2

二级参考文献5

共引文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部