期刊文献+

视觉场景描述及其效果评价 被引量:5

Visual Scene Description and Its Performance Evaluation
下载PDF
导出
摘要 作为计算机视觉?多媒体?人工智能和自然语言处理等领域的交叉性研究课题,视觉场景描述的研究内容是自动生成一个或多个语句用于描述图像或视频中呈现的视觉场景信息.视觉场景中内容的丰富性和自然语言表达的多样性使得视觉场景描述成为一项充满挑战的任务,综述了现有视觉场景描述方法及其效果评价.首先,论述了视觉场景描述的定义?研究任务及方法分类,简要分析了视觉场景描述与多模态检索、跨模态学习、场景分类、视觉关系检测等相关技术的关系;然后分类讨论视觉场景描述的主要方法?模型及研究进展,归纳日渐增多的基准数据集;接下来,梳理客观评价视觉场景描述效果的主要指标和视觉场景描述技术面临的问题与挑战,最后讨论未来的应用前景. As a cross-domain research topic related to Computer Vision, Multimedia, Artificial Intelligence and Natural Language Processing, the task of visual scene description is to produce automatically one or more sentences to describe the content of visual scene from an image or a video snippet. The richness of the content in the visual scene and the diversity of the expression of natural language make visual scene description a challenging task. This paper gives a review about the generation methods and performance evaluation on the recently developed visual scene description methods. Specifically, the research object and main tasks of visual scene description are firstly defined;the relationships between visual scene description and multi-modal retrieval, cross-modal learning, scene classification, visual relationship detection and other related technologies are discussed sequentially. And then, main methods and research progress of visual scene description are summarized in three categories, while the increasing benchmark datasets are discussed. Besides, some widely-used evaluation metrics and the corresponding challenges on the visual scene description are discussed. Finally, some potential applications in future are suggested.
作者 马苗 王伯龙 吴琦 武杰 郭敏 MA Miao;WANG Bo-Long;WU Qi;WU Jie;GUO Min(Key Laboratory of Modern Teaching Technology of Ministry of Education (Shaanxi Normal University), Xi’an 710062, China;School of Computer Science, Shaanxi Normal University, Xi’an 710119, China)
出处 《软件学报》 EI CSCD 北大核心 2019年第4期867-883,共17页 Journal of Software
基金 国家自然科学基金(61877038 61801282 61601274) 陕西省自然科学基金(2018JM6068) 中央高校基本科研业务经费(GK201703054 GK201703058)~~
关键词 深度学习 图像描述 视频描述 基准数据集 性能评价 deep learning image captioning video captioning benchmark dataset performance evaluation
  • 相关文献

参考文献1

二级参考文献7

共引文献29

同被引文献29

引证文献5

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部