摘要
针对当前主流的基于卷积神经网络(CNN)范式的跨模态图像检索算法无法有效提取舰船图像细节特征,以及跨模态“异构鸿沟”难以消除等问题,该文提出一种基于对抗机制的判别性哈希变换器(DAHT)用于舰船图像的跨模态快速检索。该网络采用双流视觉变换器(ViT)结构,依托ViT的自注意力机制进行舰船图像的判别性特征提取,并设计了Hash Token结构用于哈希生成;为了消除同类别图像的跨模态差异,整个检索框架以一种对抗的方式进行训练,通过对生成哈希码进行模态辨别实现模态混淆;同时设计了一种基于反馈机制的跨模加权5元组损失(NW-DCQL)以保持网络对不同类别图像的语义区分性。在两组数据集上开展的4类跨模态检索实验中,该文方法相比次优检索结果分别取得了9.8%,5.2%,19.7%,21.6%的性能提升(32 bit),在单模态检索任务中亦具备一定的性能优势。
In view of the problems that the current mainstream cross-modal image retrieval algorithm based on Convolutional Neural Network(CNN)paradigm can not extract details of ship images effectively,and the cross-modal“heterogeneous gap”is difficult to eliminate,a Discriminant Adversarial Hash Transformer(DAHT)is proposed for fast cross-modal retrieval of ship images.The network adopts dual-stream Vision Transformer(ViT)structure and relies on the self-attention mechanism of ViT to extract the discriminant features of ship images.Based on this,a Hash Token structure is designed for Hash generation.In order to eliminate the cross-modal difference of the same category image,the whole retrieval framework is trained in an adversarial way,and modal confusion is realized by modal discrimination of generated Hash codes.At the same time,a Normalized discounted cumulative gain Weighting based Discriminant Cross-modal Quintuplet Loss(NW-DCQL)is designed to maintain the semantic discrimination of different types of images.In the four types of cross-modal retrieval tasks carried out on two datasets,the proposed method achieves 9.8%,5.2%,19.7%,and 21.6%performance improvement compared with the suboptimal retrieval results(32 bit),and also has certain performance advantages in unimodal retrieval tasks.
作者
关欣
国佳恩
卢雨
GUAN Xin;GUO Jiaen;LU Yu(Naval Aviation University,Ysntai 264001,China;Unit 91422 of the PLA,Yantai 265200,China)
出处
《电子与信息学报》
EI
CSCD
北大核心
2023年第12期4411-4420,共10页
Journal of Electronics & Information Technology
基金
泰山学者工程专项经费(ts 201712072)
国防科技卓越青年科学基金(2017-JCJQ-ZQ-003)。
关键词
跨模态检索
舰船图像
对抗训练
哈希变换
变换器
Cross-modal retrieval
Vessel image
Adversarial training
Hash transform
Transformer收