摘要
在图像标题生成领域,交叉注意力机制在建模语义查询与图像区域的关系方面,已经取得了重要的进展.然而,其视觉连贯性仍有待探索.为填补这项空白,提出一种新颖的语境辅助的交叉注意力(Context-assisted cross attention,CACA)机制,利用历史语境记忆(Historical context memory,HCM),来充分考虑先前关注过的视觉线索对当前注意力语境生成的潜在影响.同时,提出一种名为“自适应权重约束(Adaptive weight constraint,AWC)”的正则化方法,来限制每个CACA模块分配给历史语境的权重总和.本文将CACA模块与AWC方法同时应用于转换器(Transformer)模型,构建一种语境辅助的转换器(Context-assisted transformer,CAT)模型,用于解决图像标题生成问题.基于MS COCO(Microsoft common objects in context)数据集的实验结果证明,与当前先进的方法相比,该方法均实现了稳定的提升.
The cross attention mechanism has made significant progress in modeling the relationship between semantic queries and image regions in image captioning.However,its visual coherence remains to be explored.To fill this gap,we propose a novel context-assisted cross attention(CACA)mechanism.With the help of historical context memory(HCM),CACA fully considers the potential impact of previously attended visual cues on the generation of current attention context.Moreover,we present a regularization method,called adaptive weight constraint(AWC),to restrict the total weight assigned to the historical contexts of each CACA module.We apply CACA and AWC to the Transformer model and construct a context-assisted transformer(CAT)for image captioning.Experimental results on the MS COCO(microsoft common objects in context)dataset demonstrate that our method achieves consistent improvement over the current state-of-the-art methods.
作者
连政
王瑞
李海昌
姚辉
胡晓惠
LIAN Zheng;WANG Rui;LI Hai-Chang;YAO Hui;HU Xiao-Hui(University of Chinese Academy of Sciences,Beijing 101408;Science&Technology on Integrated Information System Laboratory,Institute of Software,Chinese Academy of Sciences,Beijing 100190)
出处
《自动化学报》
EI
CAS
CSCD
北大核心
2023年第9期1889-1903,共15页
Acta Automatica Sinica
基金
国家重点研发计划(2019YFB1405100)
国家自然科学基金(61802380)资助。