期刊文献+

融合跨模态Transformer的外部知识型VQA

External Knowledge-based VQA Integrating Cross Modal Transformers
下载PDF
导出
摘要 针对外部知识型的视觉问答(visual question answering,VQA)任务性能效果不佳的问题,构建一种融合跨模态Transformer的外部知识型VQA模型框架,通过在VQA模型外引入外接知识库来提高VQA模型在外部知识型任务上的推理能力。进一步地,模型借助双向交叉注意力机制提升文本问题、图像、外接知识的语义交互融合能力,用于优化VQA模型在面对外部知识时普遍存在的推理能力不足的问题。结果表明:与基线模型LXMERT相比,在OK VQA数据集上,本文模型整体性能指标overall提升了15.01%。同时,与已有最新模型相比,在OK VQA数据集上,本文模型整体性能指标overall提升了4.46%。可见本文模型在改进外部知识型VQA任务性能方面有所提升。 In response to the issue of poor performance of external knowledge-based visual question answering tasks,a framework is constructed for external knowledge-based visual question answering(VQA)models that integrated cross-modal Transformers.By introducing an external knowledge base outside the VQA model,the inference ability of the VQA model on external knowledge-based tasks was improved.Further,the model utilized a bidirectional cross attention mechanism to enhance the semantic interactive and fusion ability of text problems,images,and in order to optimize the problem of insufficient reasoning ability commonly found in VQA models in the face of external knowledge.The results show that compared with the baseline model LXMERT,the overall performance index of the proposed model overall improves by 15.01%on the OK VQA dataset.Meanwhile,compared with the existing latest model,the overall performance index of the proposed model overall improves by 4.46%on the OK VQA dataset.It can be seen that the proposed model improves the performance of external knowledge-based VQA tasks.
作者 王虞 李明锋 孙海春 WANG Yu;LI Ming-feng;SUN Hai-chun(Key Laboratory of Security Prevention Technology and Risk Assessment,Ministry of Public Security,Beijing 100038,China;School of Information Network Security,People's Public Security University of China,Beijing 100026,China)
出处 《科学技术与工程》 北大核心 2024年第20期8577-8586,共10页 Science Technology and Engineering
基金 公安部技术研究计划项目(2020JSYJC22) 中央高校基本科研业务费专项资金(2022JKF02015)。
关键词 视觉问答(VQA) 外部知识 跨模态 知识图谱 visual question answering(VQA) external knowledge cross modal knowledge graph
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部