期刊文献+

基于知识辅助的图像描述生成

Knowledge-aided Image Captioning
下载PDF
导出
摘要 为给定图像自动生成符合人类感知的描述语句是人工智能的重要任务之一。大多数现有的基于注意力的方法均探究语句中单词和图像中区域的映射关系,而这种难以预测的匹配方式有时会造成2种模态间不协调的对应,从而降低描述语句的生成质量。针对此问题,本文提出一种文本相关的单词注意力来提高视觉注意力的正确性。这种特殊的单词注意力在模型序列地生成描述语句过程中强调不同单词的重要性,并充分利用训练数据中的内部标注知识来帮助计算视觉注意力。此外,为了揭示图像中不能被机器直接表达出来的隐含信息,将从外部知识图谱中抽取出来的知识注入到编码器—解码器架构中,以生成更新颖自然的图像描述。在MSCOCO和Flickr30k图像描述基准数据集上的实验表明,本方法能够获得良好的性能,并优于许多现有的先进方法。 Automatically generating a human-like description for a given image is one of the most important tasks in artificial intelligence.Most of the existing attention-based methods explore the mapping relationships between words in sentence and regions in image.However,the quality of generated captions can be reduced by such unpredictable matching manner which sometimes cause inharmonious alignments.To solve this problem,a new method which uses word attention to improve the correctness of visual attention when generating word-by-word sequential descriptions is proposed.The special word attention emphasizes word importance when focusing on different regions of the input image,and makes full use of the internal annotation knowledge to assist the calculation of visual attention.Furthermore,in order to reveal implied information that cannot be expressed straightforwardly by machines and generate more novel and natural captions,the external knowledge which is extracted from the knowledge graphs is injected to the encoder-decoder framework.Finally,The new method is validated on two available captioning benchmarks i.e.Microsoft COCO dataset and Flickr30k dataset.The experimental results demonstrate that this new approach can achieve a good performance and outperform many of the state-of-the-art approaches.
作者 李志欣 苏强 LI Zhixin;SU Qiang(Guangxi Key Lab of Multi-source Information Mining and Security(Guangxi Normal University),Guilin Guangxi 541004,China)
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2022年第5期418-432,共15页 Journal of Guangxi Normal University:Natural Science Edition
基金 国家自然科学基金(61966004,61866004) 广西自然科学基金(2019GXNSFDA245018) 广西“八桂学者”工程专项基金。
关键词 图像描述生成 内部知识 外部知识 单词注意力 知识图谱 强化学习 image captioning internal knowledge external knowledge word attention knowledge graph reinforcement learning
  • 相关文献

参考文献9

二级参考文献21

共引文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部