摘要
为了提高少样本图像分类的性能和泛化能力,充分利用大规模视觉语言预训练模型,提出了一种高效处理少样本图像分类问题的方法。首先,在文本编码部分,整合多个可学习的文本提示,充分挖掘图像类别标签在提示语句中不同位置对模型泛化性能的影响;其次,在图像编码部分,引入可学习的视觉提示,使图像预训练参数能更好地表征少样本图像;最后,在图像和文本特征编码器后添加特征适配器,并在图像分类数据集上微调网络,以提升网络在少样本图像分类数据集上的性能。在10个公开数据集上进行了大量实验,结果表明,相较于现有方法,所提方法在单样本分类的平均准确度上提高了2.9%。
In order to improve the performance and generalization ability of few-shot image classification,a method to efficiently deal with the classification of images with few samples by making full use of large-scale visual language pre-training model is provided.Firstly,in the text of encoding part,multiple learnable text prompts are integrated.The purpose is to fully explore how the positions of image category labels in prompt statements influence model generalization performance.Secondly,a learnable visual prompt is added in the image coding part to make the image pre-training parameters better represent the image with few samples.Finally,a feature adapter is added to the image and text feature encoder,and the network is fine-tuned on the image classification datasets,so that the network can function better on the classification datasets of images with few samples.Through extensive experiments conducted on 10 publicly available datasets,the results demonstrate that,compared to existing methods,this approach has shown an average accuracy improvement of 2.9%in single-sample classification.
作者
李宝安
王欣宇
滕尚志
吕学强
LI Baoan;WANG Xinyu;TENG Shangzhi;LYU Xueqiang(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science&Technology University,Beijing 100101,China)
出处
《北京邮电大学学报》
EI
CAS
CSCD
北大核心
2024年第2期11-17,共7页
Journal of Beijing University of Posts and Telecommunications
基金
国家自然科学基金项目(62171043,62202061)
北京市自然科学基金项目(4212020)
国家语言文字工作委员会科研项目(ZDI145-10)。
关键词
提示学习
视觉语言模型
少样本学习
图像分类
预训练模型
prompt learning
visual-language model
few-shot learning
image classification
pre-trained model