Deep Multimodal Reinforcement Network with Contextually GuidedRecurrent Attention for Image Question Answering 被引量：2

Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering

导出

摘要 Image question answering (IQA) has emerged as a promising interdisciplinary topic in computer vision and natural language processing fields. In this paper, we propose a contextually guided recurrent attention model for solving the IQA issues. It is a deep reinforcement learning based multimodal recurrent neural network. Based on compositional contextual information, it recurrently decides where to look using reinforcement learning strategy. Different from traditional 'static' soft attention, it is deemed as a kind of 'dynamic' attention whose objective is designed based on reinforcement rewards purposefully towards IQA. The finally learned compositional information incorporates both global context and local informative details, which is demonstrated to benefit for generating answers. The proposed method is compared with several state-of-the-art methods on two public IQA datasets, including COCO-QA and VQA from dataset MS COCO. The experimental results demonstrate that our proposed model outperforms those methods and achieves better performance. Image question answering (IQA) has emerged as a promising interdisciplinary topic in computer vision and natural language processing fields. In this paper, we propose a contextually guided recurrent attention model for solving the IQA issues. It is a deep reinforcement learning based multimodal recurrent neural network. Based on compositional contextual information, it recurrently decides where to look using reinforcement learning strategy. Different from traditional 'static' soft attention, it is deemed as a kind of 'dynamic' attention whose objective is designed based on reinforcement rewards purposefully towards IQA. The finally learned compositional information incorporates both global context and local informative details, which is demonstrated to benefit for generating answers. The proposed method is compared with several state-of-the-art methods on two public IQA datasets, including COCO-QA and VQA from dataset MS COCO. The experimental results demonstrate that our proposed model outperforms those methods and achieves better performance.

作者 Ai-Wen Jiang Bo Liu Ming-Wen Wang

机构地区 College of Computer and Information Engineering College of Computer Science and Software Engineering

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第4期738-748,共11页 计算机科学技术学报（英文版）

关键词 image question answering recurrent attention deep reinforcement learning multimodal recurrent neural network multimodal fusion image question answering recurrent attention deep reinforcement learning multimodal recurrent neural network multimodal fusion

分类号 TP [自动化与计算机技术]

引文网络
相关文献

同被引文献5

1Xin-Qi Bao,Yun-Fang Wu.A Tensor Neural Network with Layerwise Pretraining： Towards Effective Answer Retrieval[J].Journal of Computer Science & Technology,2016,31(6):1151-1160. 被引量：3
2荣光辉,黄震华.基于深度学习的问答匹配方法[J].计算机应用,2017,37(10):2861-2865. 被引量：13
3Xianzhi Wang,Chaoran Huang,Lina Yao,Boualem Benatallah,Manqing Dong.A Survey on Expert Recommendation in Community Question Answering[J].Journal of Computer Science & Technology,2018,33(4):625-653. 被引量：13
4梁敬东,崔丙剑,姜海燕,沈毅,谢元澄.基于word2vec和LSTM的句子相似度计算及其在水稻FAQ问答系统中的应用[J].南京农业大学学报,2018,41(5):946-953. 被引量：19
5王锦荟,金露,李泽超,唐金辉.基于知识蒸馏的跨模态哈希[J].中国科学：技术科学,2022,52(5):713-726. 被引量：4

引证文献2

1付燕,辛茹.基于混合神经网络的智能问答算法[J].计算机工程与设计,2020,41(5):1434-1438. 被引量：6
2Qiang Sun,Yan-Wei Fu,Xiang-Yang Xue.Learning a Mixture of Conditional Gating Blocks for Visual Question Answering[J].Journal of Computer Science & Technology,2024,39(4):912-928.

二级引证文献6

1胡裕峰,方旎,徐越,周博曦.基于混合神经网络与有限状态机的区域电网智能告警处理方法研究[J].供用电,2020,37(7):57-66. 被引量：9
2李凯,秦楠,熊鹰,王士贤,吴驰.高校智能问答平台的建设与应用研究--以华中科技大学为例[J].现代教育技术,2022,32(2):109-117. 被引量：4
3冯强中.基于知识增强的企业技能智能问答应用研究[J].现代计算机,2022,28(9):8-14. 被引量：1
4曹莉,赵营颖,朱红磊.基于深度学习的交互式在线教学资源管理平台[J].信息技术,2023,47(2):24-29. 被引量：3
5秦沛聪,潘威华,石宝源,钟健,刘鑫.基于深度学习的智能产品说明AI客服设计[J].信息记录材料,2023,24(8):104-107. 被引量：3
6姜雨娇,黄铝文,荚子萌.基于IMGRU-Seq2seq的自动问答方法研究[J].计算机应用与软件,2024,41(6):215-222.

1Jun Yin,Wayne Xin Zhao,Xiao-Ming Li.Type-Aware Question Answering over Knowledge Base with Attention-Based Tree-Structured Neural Networks[J].Journal of Computer Science & Technology,2017,32(4):805-813. 被引量：4
2Xin-Yu Ou,Ping Li,He-Fei Ling,Si Liu,Tian-Jiang Wang.Objectness Region Enhancement Networks for Scene Parsing[J].Journal of Computer Science & Technology,2017,32(4):683-700.
3Li-Wei Kang,Ching-Yu Tseng,Chao-Long Jheng,Ming-Fang Weng,Chao-Yung Hsu.Cluster-Based Saliency-Guided Content-Aware Image Retargeting[J].Journal of Electronic Science and Technology,2017,15(2):141-146.
4Xiong Luo,Jing Deng,Ji Liu,Weiping Wang,Xiaojuan Ban,Jenq-Haur Wang.A Quantized Kernel Least Mean Square Scheme with Entropy-Guided Learning for Intelligent Data Analysis[J].China Communications,2017,14(7):127-136. 被引量：5
5Kai-Yuan Cui,Peng-Jie Ren,Zhu-Min Chen,Tao Lian,Jun Ma.Relation Enhanced Neural Model for Type Classification of EntityMentions with a Fine-Grained Taxonomy[J].Journal of Computer Science & Technology,2017,32(4):814-827.
6Franny G.MURILLO-GARCíA,Mauro ROSSI,Francesca ARDIZZONE,Federica FIORUCCI,Irasema ALCáNTARA-AYALA.Hazard and population vulnerability analysis: a step towards landslide risk assessment[J].Journal of Mountain Science,2017,14(7):1241-1261. 被引量：2
7袁超,张丽丽,李红海.双机冷备对称架构在区域中心站中的实现[J].山西气象,2016(3):40-42.
8梁倬骞,王东,朱慧,潘定.基于领域本体的网络财务报告文本信息抽取研究[J].广东工业大学学报,2017,34(3):89-95. 被引量：1
9Tian-Bi Jiang,Gui-Song Xia,Qi-Kai Lu,Wei-Ming Shen.Retrieving Aerial Scene Images with Learned Deep Image-SketchFeatures[J].Journal of Computer Science & Technology,2017,32(4):726-737. 被引量：2
10刘学,程大江,王峰,温瑞生.融合多重信息的图像局部不变特征描述[J].无线电通信技术,2017,43(4):52-55. 被引量：3

Journal of Computer Science & Technology

2017年第4期

浏览历史

内容加载中请稍等...

Deep Multimodal Reinforcement Network with Contextually GuidedRecurrent Attention for Image Question Answering 被引量：2

同被引文献5

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史