期刊文献+

基于联合定位和上下文推理的深度密集描述方法

A deep dense captioning framework with joint localization and contextual reasoning
下载PDF
导出
摘要 图像密集描述是指自动检测图像中的感兴趣区域(region of interest,ROI),并生成自然语言短语或句子来描述这些区域中的语义内容。然而,它存在三个主要难题:第一,图像中密集且高度重叠的ROI,使得难以精确定位到目标区域;第二,图像中一些视觉模糊的ROI,使得难以仅凭借外观来识别目标区域;第三,图像特征表示的深度对视觉识别是极其重要的。针对这三个难题,本文提出了一种端到端的密集描述模型,包括三个关键模块:联合定位模块、上下文推理模块和深度卷积神经网络(convolutional neural network,CNN),其中,试验了5种深度CNN结构。在Visual Genome数据集上的实验结果显示,该模型性能显著,且优于同类其他方法。 Dense captioning aims to simultaneously localize and describe regions-of-interest(RoIs)in images in natural language.Specifically,we identify three key problems:1)dense and highly overlapping RoIs,making accurate localization of each target region challenging;2)some visually ambiguous target regions which are hard to recognize each of them just by appearance;3)an extremely deep image representation which is of central importance for visual recognition.To tackle these three challenges,we propose a novel end-to-end dense captioning framework consisting of a joint localization module,a contextual reasoning module and a deep convolutional neural network(CNN).We also evaluate five deep CNN structures to explore the benefits of each.Extensive experiments on visual genome(VG)dataset demonstrate the effectiveness of our approach,which compares favorably with the state-of-the-art methods.
作者 孔锐 谢玮 KONG Rui;XIE Wei(School of Intelligent Systems Science and Engineering,Jinan University,Zhuhai 519070,China)
出处 《Journal of Central South University》 SCIE EI CAS CSCD 2021年第9期2801-2813,共13页 中南大学学报(英文版)
基金 Project(2020A1515010718)supported by the Basic and Applied Basic Research Foundation of Guangdong Province,China。
关键词 密集描述 联合定位 上下文推理 深度卷积神经网络 dense captioning joint localization contextual reasoning deep convolutional neural network
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部