期刊文献+

基于图卷积网络的视觉问答研究

Research on Visual Question Answering Based on Graph Convolutional Network
下载PDF
导出
摘要 随着计算机视觉和自然语言处理的日益发展,视觉问答也发展为计算机科学领域的一个重要研究方向。视觉问答需要跨模态的理解与推理能力(图像与文本)。由于图中节点和边的高度相关性以及图本身的联通性,图在提高视觉问答模型的推理能力上有一定的潜力,因此提出了一种基于图卷积网络的视觉问答方法。首先使用神经网络分别提取图像和文本特征,再用图处理模块将预处理后图像和文本处理为图结构数据,然后实现基于图卷积网络的模型设计,数据训练与答案预测。通过与ReasonNet和BottomUp等模型在VQA2.0数据集上进行对比实验,验证了该方法提升了视觉问答任务的准确率。 With the increasing development of computer vision and natural language processing,visual question answering has developed into an important research direction in the field of computer science.Visual question answering requires cross-modal understanding and reasoning ability(image and text).Due to the high correlation between nodes and edges in the graph and the con⁃nectivity of the graph itself,the graph has certain potential to improve the reasoning ability of visual question answering models.Therefore,a method of visual question answering based on graph convolutional network is proposed.First,neural network is used to extract image features and text features.Then the graph processing module is used to process the preprocessed image and text into graph structure data.Finally,the network structure design,data training and answer prediction based on graph convolution network are carried out.By comparing experiments with models such as ReasonNet and BottomUp on the VQA2.0 data set,it is verified that the method improves the accuracy of visual question answering tasks.
作者 龚安 丁磊 姚鑫杰 GONG An;DING Lei;YAO Xinjie(China University of Petroleum,Qingdao 266580)
出处 《计算机与数字工程》 2022年第1期135-139,共5页 Computer & Digital Engineering
基金 国家油气重大专项(编号:2017ZX05013-001)资助。
关键词 视觉问答 图卷积网络 计算机视觉 自然语言处理 词向量 visual question answering graph convolutional network computer vision natural language processing word vectors
  • 相关文献

参考文献1

共引文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部