摘要
传统视觉问答技术仅采用简单的位置注意力,缺乏语义注意力,从而引起问题推理错误.本文采用双重注意力机制从图像获取位置信息和语义信息,以外积形式进行融合,获得文本也采用双重注意力融合实体和对应关系的信息,帮助理解问题.双重注意力动态方式可以实现关系融合、动态学习,改变传统静态学习方式.以多标签分类器实现答案推理,减少传统二分类带来的偶然性.将视觉问答技术模型在数据集上进行验证,结果表明,本文方法有效提高了答案推理的准确性.
Errors of problem reasoning related to traditional visual question-answer technology arise from the lack of semantic attention due to application of simple positional attention.Location information and semantic information are obtained from images by using dual attention form,and then fused in the form of outer product.Dual attention form is also adopted to fuse entity and the corresponding information of texts,which help to understand problems.The dual-attention dynamic method,therefore,can be used to complete relationship fusion,dynamic learning,thus improving the traditional static learning method.Then a multi-label classifier is used to reduce the contingency caused by traditional two-class classification.The VQA model is validated in the data set VQA 2.0,VQ-CP V2 and Visual Genome,improving the accuracy of answer inference.
作者
张伟
ZHANG Wei(Institute of Science and Technology, Changzhou Open University, Changzhou 213000, China)
出处
《南京工程学院学报(自然科学版)》
2021年第3期80-84,共5页
Journal of Nanjing Institute of Technology(Natural Science Edition)
关键词
关系感知
双重注意力
视觉问答
relationship perception
dual attention
visual question and answer