期刊文献+

一致性协议匹配的跨模态图像文本检索方法 被引量:2

Matching with agreement for cross-modal image-text retrieval
下载PDF
导出
摘要 跨模态图像文本检索的任务对于理解视觉和语言之间的对应关系很重要,大多数现有方法利用不同的注意力模块挖掘区域到词和词到区域的对齐来探索细粒度的跨模态关联。然而,现有的方法没有考虑到基于双重注意力会导致对齐不一致的问题。为此,本文提出了一种一致性协议匹配方法,旨在利用一致性对齐来增强跨模态检索的性能。本文采用注意力实现跨模态关联对齐,并基于跨模态对齐结果设计了基于竞争性投票的跨模态协议,该协议衡量了跨模态对齐的一致性,可以有效提升跨模态图像文本检索的性能。在Flickr30K和MS COCO两个基准数据集上,本文通过大量的实验证明了所提出的方法的有效性。 The task of cross-modal image-text retrieval is important to understand the correspondence between vision and language.Most existing methods leverage different attention modules to explore region-to-word and word-to-region alignments and study fine-grained cross-modal correlations.However,the inconsistent alignment problem based on attention has rarely been considered.This study proposes a matching with agreement(MAG)method,which aims to take advantage of the alignment consistency,enhancing the cross-modal retrieval performance.The attention mechanism is adopted to achieve the cross-modal association alignment,which is then used to perform a cross-modal matching agreement with a novel competitive voting strategy.This agreement evaluates the cross-modal matching consistency and effectively improves the performance.The extensive experiments on two benchmark datasets,namely,Flickr30K and MS COCO,show that our MAG method can achieve state-of-the-art performance,demonstrating its effectiveness well.
作者 宫大汉 陈辉 陈仕江 包勇军 丁贵广 GONG Dahan;CHEN Hui;CHEN Shijiang;BAO Yongjun;DING Guiguang(School of Software,Tsinghua University,Beijing 100084,China;Beijing National Research Center for Information Science and Technology,Tsinghua University,Beijing 100084,China;Department of Automation,Tsinghua University,Beijing 100084,China;Zhuoxi Institute of Brain and Intelligence,Hangzhou 311121,China;Jd.Com,Inc,Beijing 100176,China)
出处 《智能系统学报》 CSCD 北大核心 2021年第6期1143-1150,共8页 CAAI Transactions on Intelligent Systems
基金 国家自然科学基金项目(61925107,U1936202) 中国博士后科学基金创新人才支持计划项目(BX2021161)。
关键词 人工智能 计算机视觉 视觉和语言 跨模态检索 一致性协议匹配 注意力 卷积神经网络 循环神经网络 门控循环单元 artificial intelligence computer vision vision and language cross-modal retrieval matching with agreement attention convolutional neural network recurrent neural network gated recurrent unit
  • 相关文献

参考文献2

二级参考文献2

共引文献8

同被引文献12

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部