注意力增强的视觉Transformer图像检索算法

Image retrieval method with attention-enhanced visual Transformer

下载PDF

导出

摘要基于深度哈希的图像检索方法往往利用卷积和池化技术去提取图像局部信息,并且需要不断加深网络层次来获得全局长依赖关系,这些方法一般具有较高的复杂度和计算量。本文提出了一种注意力增强的视觉Transformer图像检索算法,算法使用预训练的视觉Transformer作为基准模型,提升模型收敛速度,通过对骨干网络的改进和哈希函数的设计,实现了高效的图像检索。一方面,本文设计了一个注意力增强模块,来捕获输入特征图的局部显著信息和视觉细节,学习相应的权重以突出重要特征,并增强输入到Transformer编码器的图像特征的表征力。另一方面,为了提高图像检索的效率,设计了一种对比哈希损失函数,生成具有判别力的二进制哈希码,从而降低了内存需求与计算复杂度。在CIFAR-10和NUS-WIDE数据集上的实验结果表明,本文提出的方法,在两个不同数据集上使用不同哈希码长度的平均精度均值达到了96.8%和86.8%,性能超过多种经典的深度哈希算法和其他两种基于Transformer架构的图像检索算法。 The image retrieval methods based on deep hashing often use convolution and pooling techniques to extract local information from images and require deepening the network layers to obtain global long-range dependencies.These methods generally have high complexity and computational requirements.This paper proposes a vision Transformer-based image retrieval algorithm enhanced with attention,which uses a pre-trained vision Transformer as a benchmark model to improves model convergence speed and achieves efficient image retrieval through improvements to the backbone network and hash function design.On the one hand,the algorithm designs an attention enhancement module to capture local salient information and visual details of the input feature map,learns corresponding weights to highlight important features,enhances the representativeness of image features input to the Transformer encoder.On the other hand,to generate discriminative hash codes,a contrastive hash loss is designed to further ensure the accuracy of image retrieval.Experimental results on the CIFAR-10 and NUS-WIDE datasets show that the proposed method achieves an average precision of 96.8%and 86.8%,respectively,using different hash code lengths on two different datasets,outperforming various classic deep hashing algorithms and two other Transformer-based image retrieval algorithms.

作者刘华咏黄聪金汉均 Liu Huayong;Huang Cong;Jin Hanjun(School of Computer Science,Central China Normal University,Wuhan 430070,China)

机构地区华中师范大学计算机学院

出处《电子测量技术》北大核心 2023年第23期50-55,共6页 Electronic Measurement Technology

基金教育部人文社会科学研究项目(21YJA870005)资助

关键词图像检索视觉Transformer 深度哈希注意力模块 image retrieval vision Transformer deep hash attention module

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1熊诚,巫朝霞.声誉评估的联邦学习激励机制设计与研究[J].信息记录材料,2024,25(2):196-200.

电子测量技术

2023年第23期

浏览历史

内容加载中请稍等...

注意力增强的视觉Transformer图像检索算法

相关作者

相关机构

相关主题

浏览历史