摘要
图像自动语句标注利用计算机自动生成描述图像内容的语句,在服务机器人等领域有广泛应用.许多学者已经提出了一些基于注意力机制的算法,但是注意力分散问题以及由注意力分散引起的生成语句错乱问题还未得到较好解决.在传统注意力机制的基础上引入注意力反馈机制,利用关注信息的图像特征指导文本生成,同时借助生成文本中的关注信息进一步修正图像中的关注区域,该过程不断强化图像和文本中的关键信息匹配、优化生成的语句.针对常用数据集Flickr8k, Flickr30k 和MSCOCO 的实验结果表明,该模型在一定程度上解决了注意力分散和语句顺序错乱问题,比其他基于注意力机制方法标注的关注区域更加准确,生成语句更加通顺.
The image captioning problem aims to let machine generate relevant sentence of a given image, which has been applied to the service robot. To improve the performance of image captioning effectively, some researchers propose to leverage the attention mechanism. However, the mechanism often suffers from distraction and sentence-disorder. In this paper, we propose an image captioning model based on a novel feed-back attention mechanism. In generating the corresponding language for a given image, the proposed model uses the attention feedback from the generated language. With the feedback, the attention heatmap of the original image will be revised, and the generated sentence will also be better. We evaluate the proposed method on three benchmark datasets, i.e., Flickr8k, Flickr30k and MSCOCO, and the experimental results show the superiority of the proposed method.
作者
吕凡
胡伏原
张艳宁
夏振平
盛胜利
Lyu Fan;Hu Fuyuan;Zhang Yanning;Xia Zhenping;Victor S Sheng(School of Electronic & Information Engineering, Suzhou University of Science and Technology, Suzhou 215009;Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou, Suzhou 215009;School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an 710029;Department of Computer Science, University of Central Arkansas, Conway AZ 72035;College of Intelligence and Computing, Tianjin University, Tianjin 300072;Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou 215009)
出处
《计算机辅助设计与图形学学报》
EI
CSCD
北大核心
2019年第7期1122-1129,共8页
Journal of Computer-Aided Design & Computer Graphics
基金
国家自然科学基金(61876121,61472267,61728205,61502329)
江苏省重点研发计划(BE2017663)