摘要
交互式电子手册是提高各类装备保障信息化、智能化的关键技术之一,针对其检索模态单一的问题,以其数据中图文描述为研究对象,提出一种融合注意力机制的细粒度跨模态检索算法。针对数据中图像简图较多、色彩单一等特点,特征提取模块使用Vision Transformer模型和Transformer编码器分别获得图文的全局和局部特征;使用注意力机制在图文模态间及模态内部挖掘细粒度信息,加入文本对抗训练增强模型泛化能力,采用跨模态联合损失函数对模型进行约束。在Pascal Sentence数据集和自建数据集上进行验证,所提方法的平均精度均值分别达到了0.964和0.959,较基准模型(深度监督跨模态检索)分别提升了0.248和0.214。
Interactive electronic manual is an important technology to improve the informatization and intelligence of various equipment support.Aiming at the problem of single retrieval modal,an improved fine grained cross-modal retrieval algorithm with attention mechanism fused is proposed,which takes the graphic descriptions of the data as the research object.In view of the characteristics of many image sketches and single color in the data,the feature extraction module uses the Vision Transformer model and Transformer encoder to obtain the global and local features of the picture and text,respectively.Moreover,the attention mechanism is applied to mine fine grained information between and within graphic and text modes,and text confrontation training is added to enhance the model’s generalization ability.In addition,the cross-modal joint loss function is used to constrain the model.Verifying on the Pascal Sentence dataset and self-built dataset,the average accuracy of the proposed method reaches 0.964 and 0.959 respectively,which is 0.248 and 0.214 higher than the benchmark model deep supervised cross modal retrieval(DSCMR),respectively.
作者
翟一琛
顾佼佼
宗富强
姜文志
ZHAI Yichen;GU Jiaojiao;ZONG Fuqiang;JIANG Wenzhi(Coastal Defense College,Naval Aviation University,Yantai 264001,China)
出处
《系统工程与电子技术》
EI
CSCD
北大核心
2023年第12期3915-3923,共9页
Systems Engineering and Electronics
关键词
交互式电子手册
图文检索
跨模态
注意力机制
interactive electronic technical manual
image-text retrieval
cross-modal
attention mechanism