摘要
为解决视觉问答(Visual Question Answering,VQA)算法中问题与图像缺乏推理关系的难题,提出了增强问题有用信息的问题引导图像注意力机制(Question Guide Image Attention,QGIA)视觉问答算法。该算法在问题特征提取过程中对关键词进行筛选,加强对问题有效信息的关注,实现对问题的注意,同时,该算法加强了对图像属性特征的关注,使图像信息更加丰富。通过问题强化和图像强化,引导图像特征根据有效问题特征更好地对问题作出回答,提高了视觉问答算法的有效性。将该算法在VQA V2.0数据集上实验验证,准确率达到67.89%。研究结论为视觉问答技术的实现提供了理论支持。
To explore the problem of the lack of reasoning relationship between the questions and the images in the VQA(Visual Question Answering)algorithm,a QGIA(Question-Guided Image Attention)mechanism visual question answering algorithm that enhances the useful information of questions is proposed.This algorithm selects keywords in the process of question feature extraction by strengthening the attention to the effective information of the question,so as to realize the attention to the question;at the same time,the algorithm strengthens the attention to image attribute features to enrich the image information.Through question and image enhancement,image features are guided to better answer the questions based on effective question features,which improves the effectiveness of visual question answering algorithms.The algorithm is experimentally verified on the VQA V2.0 dataset,and the accuracy rate reaches 67.89%.The research conclusions provide theoretical support for the realization of visual question answering technology.
作者
陈婷
王玉德
任志伟
CHEN Ting;WANG Yude;REN Zhiwei(Qufu Normal University,Qufu Shandong 273165,China)
出处
《通信技术》
2022年第2期166-173,共8页
Communications Technology
基金
山东省研究生导师指导能力提升计划项目(SDYY18119)
山东省研究生教学案例库建设项目(SDYAL21090)。
关键词
视觉问答
推理关系
问题引导图像
注意力机制
visual question answering
reasoning relationship
question-guided image attention
attention mechanism