基于视觉语言提示学习的少样本图像分类方法

Few-Shot Image Classification Method Based on Visual Language Prompt Learning

导出

摘要为了提高少样本图像分类的性能和泛化能力,充分利用大规模视觉语言预训练模型,提出了一种高效处理少样本图像分类问题的方法。首先,在文本编码部分,整合多个可学习的文本提示,充分挖掘图像类别标签在提示语句中不同位置对模型泛化性能的影响;其次,在图像编码部分,引入可学习的视觉提示,使图像预训练参数能更好地表征少样本图像;最后,在图像和文本特征编码器后添加特征适配器,并在图像分类数据集上微调网络,以提升网络在少样本图像分类数据集上的性能。在10个公开数据集上进行了大量实验,结果表明,相较于现有方法,所提方法在单样本分类的平均准确度上提高了2.9%。 In order to improve the performance and generalization ability of few-shot image classification,a method to efficiently deal with the classification of images with few samples by making full use of large-scale visual language pre-training model is provided.Firstly,in the text of encoding part,multiple learnable text prompts are integrated.The purpose is to fully explore how the positions of image category labels in prompt statements influence model generalization performance.Secondly,a learnable visual prompt is added in the image coding part to make the image pre-training parameters better represent the image with few samples.Finally,a feature adapter is added to the image and text feature encoder,and the network is fine-tuned on the image classification datasets,so that the network can function better on the classification datasets of images with few samples.Through extensive experiments conducted on 10 publicly available datasets,the results demonstrate that,compared to existing methods,this approach has shown an average accuracy improvement of 2.9%in single-sample classification.

作者李宝安王欣宇滕尚志吕学强 LI Baoan;WANG Xinyu;TENG Shangzhi;LYU Xueqiang(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science&Technology University,Beijing 100101,China)

机构地区北京信息科技大学网络文化与数字传播北京市重点实验室

出处《北京邮电大学学报》 EI CAS CSCD 北大核心 2024年第2期11-17,共7页 Journal of Beijing University of Posts and Telecommunications

基金国家自然科学基金项目(62171043,62202061) 北京市自然科学基金项目(4212020) 国家语言文字工作委员会科研项目(ZDI145-10)。

关键词提示学习视觉语言模型少样本学习图像分类预训练模型 prompt learning visual-language model few-shot learning image classification pre-trained model

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1倪娟,王剑卓.基于Mixup的心电图多标签异常心律检测方法[J].自动化与信息工程,2024,45(3):51-55.
2杨帆,王志社,孙婧,余朝发.红外与可见光图像交互自注意力融合方法[J].光子学报,2024,53(6):214-225.
3杨珂,王鑫,凌佳杰,耿光超,江全元.基于物理信息神经网络的同步发电机建模[J].中国电机工程学报,2024,44(12):4924-4932.
4王昱然,彭润霖,周钰斌,陈鹏天,吴凯,周静.基于无监督学习的神经精神疾病辅助诊断研究进展[J].中国医学物理学杂志,2024,41(6):782-787.

北京邮电大学学报

2024年第2期

浏览历史

内容加载中请稍等...

基于视觉语言提示学习的少样本图像分类方法

相关作者

相关机构

相关主题

浏览历史