期刊文献+

融合多头自注意力机制和共同注意的图像问答模型 被引量:1

Image Question Answering Based on the Multi-head Self-attention Mechanism and Co-attention
下载PDF
导出
摘要 为了获取到更加细粒度的图像表示,防止图像特征获取时关键信息的丢失,论文采用融合多头自注意机制的图像特征提取模型,来获取图像特征。通过对问题文本信息使用自注意力机制并用来引导图像注意,增强问题文本特征与图像特征之间的关联性,获取图像特征中与问题文本相关的信息。将最终获取到的图像特征与问题特征进行多模态特征融合,并对融合特征进行分类预测。实验结果表明,论文方法在VQA1.0数据集上,总体准确率为64.6%,在VQA2.0数据集上,总体准确率为63.9%,从而验证了论文方法的有效性,相比一些经典的方法都有较好的提升。 In order to obtain a more fine-grained image representation and prevent the loss of key information during image feature acquisition,this paper uses an image feature extraction model fused with a multi-head self-attention mechanism to obtain image features.By using the self-attention mechanism for the question text information and guiding the attention of the image,the correlation between the question text feature and the image feature is enhanced,and the information related to the question text in the image feature is obtained.Multi-modal feature fusion is performed on the finally acquired image features and problem features,and the fusion features are classified and predicted.The experimental results show that the overall accuracy rate of the method in this paper is 64.6%on the VQA1.0 data set and 63.9%on the VQA2.0 data set,which verifies the effectiveness of the method in this paper.Compared with some classics,the methods have been improved.
作者 官巍 张晗 马力 GUAN Wei;ZHANG Han;MA Li(School of Computer Science,Xi'an University of Posts&Telecommunications,Xi'an 710121)
出处 《计算机与数字工程》 2023年第6期1291-1296,共6页 Computer & Digital Engineering
关键词 图像问答 注意力机制 多模态融合 深度神经网络 image question answering attention mechanism multi-modal fusion deep neural network
  • 相关文献

参考文献3

二级参考文献6

共引文献38

同被引文献9

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部