期刊文献+

基于多模态融合的视觉问答传输注意网络

Visual Question Answer Transmission Attention Network Based on Multi-Modal Fusion
下载PDF
导出
摘要 针对传统视觉问答任务无法完全捕捉多模态特征之间复杂相关性的缺点,文中提出了基于多模态融合的视觉问答传输注意网络。在特征提取部分,分别利用GloVe词嵌入+LSTM提取问题特征,并使用ResNet-152网络提取图像特征。通过3层传输注意网络进行多模态融合来学习全局多模态嵌入信息,进而使用该嵌入重新校准输入特征。文中设计了一个多模态传输注意学习架构,通过对传输网络进行重叠计算,使组合特征聚焦在图像和问题的细粒度部分,提高了预测答案的准确率。在VQA v1.0数据集上的实验结果表明,该模型的总体准确率达到了69.92%,显著优于其他5种主流视觉问答模型的准确率,证明了该模型的有效性和鲁棒性。 In view of the shortcomings of traditional visual question answering tasks that cannot fully capture the complex correlation between multi-modal features,this study proposes a visual question-and-answer transmission attention network based on multi-modal fusion.In the feature extraction part,GloVe word embedding+LSTM is used to extract problem features,and ResNet-152 network is adopted to extract image features.Multi-modal fusion is performed through a 3-layer transfer attention network to learn global multi-modal embedding information,which is then used to recalibrate the input features.In addition,a multi-modal transmission attention learning architecture is designed.Through overlapping calculations on the transmission network,the combined features focus on the fine-grained parts of the image and the question,which improves the accuracy of the predicted answer.The experimental results on the VQA v1.0 data set show that the overall accuracy of the model reaches 69.92%,which is improved to varying degrees compared with the accuracy of the other 5 mainstream visual question answering models,indicating the effectiveness of the model and robustness.
作者 王茂 彭亚雄 陆安江 WANG Mao;PENG Yaxiong;LU Anjiang(College of Big Data and Information Engineering,Guizhou University,Guiyang 550025,China)
出处 《电子科技》 2022年第12期72-77,共6页 Electronic Science and Technology
基金 贵州省科技重大专项([2016]3022) 贵州省科技成果转化项目([2017]4856)。
关键词 视觉问答 多模态特征 组合特征 多模态嵌入 注意力 传输网络 细粒度 多模态融合 visual question answering multi-modal features combined features multi-modal embedding attention transmission network fine-grained multi-modal fusion
  • 相关文献

参考文献2

二级参考文献8

共引文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部