期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Deep Multimodal Reinforcement Network with Contextually GuidedRecurrent Attention for Image Question Answering 被引量:2
1
作者 Ai-Wen Jiang Bo Liu Ming-Wen Wang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第4期738-748,共11页
Image question answering (IQA) has emerged as a promising interdisciplinary topic in computer vision and natural language processing fields. In this paper, we propose a contextually guided recurrent attention model fo... Image question answering (IQA) has emerged as a promising interdisciplinary topic in computer vision and natural language processing fields. In this paper, we propose a contextually guided recurrent attention model for solving the IQA issues. It is a deep reinforcement learning based multimodal recurrent neural network. Based on compositional contextual information, it recurrently decides where to look using reinforcement learning strategy. Different from traditional 'static' soft attention, it is deemed as a kind of 'dynamic' attention whose objective is designed based on reinforcement rewards purposefully towards IQA. The finally learned compositional information incorporates both global context and local informative details, which is demonstrated to benefit for generating answers. The proposed method is compared with several state-of-the-art methods on two public IQA datasets, including COCO-QA and VQA from dataset MS COCO. The experimental results demonstrate that our proposed model outperforms those methods and achieves better performance. 展开更多
关键词 image question answering recurrent attention deep reinforcement learning multimodal recurrent neural network multimodal fusion
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部