期刊文献+

基于关系感知双重注意力融合的视觉问答技术

Visual Question-answer Technology of Dual Attention Fusion Based on Relational Perception
下载PDF
导出
摘要 传统视觉问答技术仅采用简单的位置注意力,缺乏语义注意力,从而引起问题推理错误.本文采用双重注意力机制从图像获取位置信息和语义信息,以外积形式进行融合,获得文本也采用双重注意力融合实体和对应关系的信息,帮助理解问题.双重注意力动态方式可以实现关系融合、动态学习,改变传统静态学习方式.以多标签分类器实现答案推理,减少传统二分类带来的偶然性.将视觉问答技术模型在数据集上进行验证,结果表明,本文方法有效提高了答案推理的准确性. Errors of problem reasoning related to traditional visual question-answer technology arise from the lack of semantic attention due to application of simple positional attention.Location information and semantic information are obtained from images by using dual attention form,and then fused in the form of outer product.Dual attention form is also adopted to fuse entity and the corresponding information of texts,which help to understand problems.The dual-attention dynamic method,therefore,can be used to complete relationship fusion,dynamic learning,thus improving the traditional static learning method.Then a multi-label classifier is used to reduce the contingency caused by traditional two-class classification.The VQA model is validated in the data set VQA 2.0,VQ-CP V2 and Visual Genome,improving the accuracy of answer inference.
作者 张伟 ZHANG Wei(Institute of Science and Technology, Changzhou Open University, Changzhou 213000, China)
出处 《南京工程学院学报(自然科学版)》 2021年第3期80-84,共5页 Journal of Nanjing Institute of Technology(Natural Science Edition)
关键词 关系感知 双重注意力 视觉问答 relationship perception dual attention visual question and answer
  • 相关文献

参考文献6

二级参考文献23

共引文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部