期刊文献+

面向图像自动语句标注的注意力反馈模型 被引量:5

Feedback Attention Model for Image Captioning
下载PDF
导出
摘要 图像自动语句标注利用计算机自动生成描述图像内容的语句,在服务机器人等领域有广泛应用.许多学者已经提出了一些基于注意力机制的算法,但是注意力分散问题以及由注意力分散引起的生成语句错乱问题还未得到较好解决.在传统注意力机制的基础上引入注意力反馈机制,利用关注信息的图像特征指导文本生成,同时借助生成文本中的关注信息进一步修正图像中的关注区域,该过程不断强化图像和文本中的关键信息匹配、优化生成的语句.针对常用数据集Flickr8k, Flickr30k 和MSCOCO 的实验结果表明,该模型在一定程度上解决了注意力分散和语句顺序错乱问题,比其他基于注意力机制方法标注的关注区域更加准确,生成语句更加通顺. The image captioning problem aims to let machine generate relevant sentence of a given image, which has been applied to the service robot. To improve the performance of image captioning effectively, some researchers propose to leverage the attention mechanism. However, the mechanism often suffers from distraction and sentence-disorder. In this paper, we propose an image captioning model based on a novel feed-back attention mechanism. In generating the corresponding language for a given image, the proposed model uses the attention feedback from the generated language. With the feedback, the attention heatmap of the original image will be revised, and the generated sentence will also be better. We evaluate the proposed method on three benchmark datasets, i.e., Flickr8k, Flickr30k and MSCOCO, and the experimental results show the superiority of the proposed method.
作者 吕凡 胡伏原 张艳宁 夏振平 盛胜利 Lyu Fan;Hu Fuyuan;Zhang Yanning;Xia Zhenping;Victor S Sheng(School of Electronic & Information Engineering, Suzhou University of Science and Technology, Suzhou 215009;Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou, Suzhou 215009;School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an 710029;Department of Computer Science, University of Central Arkansas, Conway AZ 72035;College of Intelligence and Computing, Tianjin University, Tianjin 300072;Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou 215009)
出处 《计算机辅助设计与图形学学报》 EI CSCD 北大核心 2019年第7期1122-1129,共8页 Journal of Computer-Aided Design & Computer Graphics
基金 国家自然科学基金(61876121,61472267,61728205,61502329) 江苏省重点研发计划(BE2017663)
关键词 图像自动语句标注 注意力机制 注意力反馈 image captioning attention mechanism attention feedback
  • 相关文献

参考文献2

二级参考文献25

  • 1Luo P, Tian Y L , Wang X Q ef al. Switchable deep network fo rpedestrian detection[C] //Proceedings o f the IEEE ComputerSociety Conference on Computer V ision and Pattern Recognition Workshops. Los Alam ltos: IEEE Computer Society Press,2014: 49-56.
  • 2Chen T, Cheng M M , Tan P, et ah Sketch2Photo: internet imagemontage[J]. AC M Transactions on Graphics, 2009,28(5): A rticle No. 124.
  • 3Itti L . Autom atic foveation fo r video compression using aneurobiological model o f visual attention[J]. IEEE Transactionson Image Processing, 2004,13(10): 1304-1318.
  • 4Itti L, Koch C, Niebur E. A model o f saliency-based visual attentionfo r rapid scene analysis[J]. IEEE Transactions on PatternAnalysis and Machine Intelligence, 1998,20(11): 1254-1259.
  • 5Cheng M , M itra N J, Huang X , et al. Global contrast based salien t region detection[J]. IEEE Transactions on Pattern Analysisand Machine Intelligence, 2015,37(3): 569-582.
  • 6Shen X H, W u Y. A unified approach to salient object detectionvia low rank m atrix recovery[C] //Proceedings o f the IEEEConference on Computer \^sio n and Pattern Recognition. LosAlam itos: IEEE Computer Society Press, 2012: 853-860.
  • 7L iu T, Yuan Z J, Sun J, et al. Learning to detect a salient object[J]. IEEE Transactions on Pattern Analysis and MachineIntelligence, 2011, 33(2): 353-367.
  • 8Yang J M , Yang M H. Top-down visual saliency via jo in t CRFand dictionary leam ing[C] //Proceedings o f the IEEE Conferenceon Computer Vision and Pattern Recognition. Los A lam itos:IEEE Computer Society Press, 2012: 2296-2303.
  • 9B oqi A . Boosting bottom-up and top-down visual features fo rsaliency estim ation[C] //Proceedings o f the IEEE Conferenceon Computer ^ s io n and Pattern Recognition. Los Alam itos:IEEE Computer Society Press, 2012: 438-445.
  • 10Harel J, Koch C, Perona P. Graph-based visual saliency[C]//Proceedings o f the 20th Annual Conference on Neural In formation Processing Systems. Cambridge: M IT Press, 2006:545-552.

共引文献9

同被引文献36

引证文献5

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部