摘要
图像描述技术,就是以图像为输入,通过数学模型和计算使计算机输出对应图像的自然语言描述文字,使计算机拥有“看图说话”的能力,是图像处理领域中继图像识别、图像分割和目标跟踪之后的又一新型任务。文中以图像描述技术的发展历程为主线,对图像描述任务的方法、评价指标和常用数据集进行了详细的综述。针对图像描述任务的技术方法,总结了基于模板、检索和深度学习的图像描述生成方法,重点介绍了基于深度学习的图像描述的多种方法,并对不同方法的实验结果进行了总结和讨论;详细介绍了图像描述任务的实验结果评价指标及其计算方法和该任务中常用的数据集;最后提出了该任务现有的问题和未来的发展方向。
Image captioning is a task that uses an image as input to generate the natural language description of this image by modeling and calculation,so that computers have the ability to“talk about the pictures”.It is another new type of computer vision task after image recognition,image segmentation and target tracking.This paper focuses on the development of image captioning and gives a detailed survey of the image captioning methods based on template,retrieval and deep learning.And this paper especially focuses on the deep learning-based methods and discusses the experimental results of various methods.Experimental evalu-ation indexes and the common datasets used in this field are introduced in detail.Finally,this paper points out the problems and research directions in the future.
作者
苗益
赵增顺
杨雨露
徐宁
杨皓然
孙骞
MIAO Yi;ZHAO Zeng-shun;YANG Yu-lu;XU Ning;YANG Hao-ran;SUN Qian(College of Electronic and Information Engineering,Shandong University of Science and Technology,Qingdao,Shandong 266590,China;School of Control Science and Engineering,Shandong University,Jinan 250061,China;Department of Electrical&Computer Engineering,University of Florida,Gainesville,Florida 32611,USA)
出处
《计算机科学》
CSCD
北大核心
2020年第12期149-160,共12页
Computer Science
基金
国家自然科学基金(61403281)
中国博士后科学基金(2015T80717)
山东省自然科学基金(ZR2014FM002)。
关键词
图像处理
图像描述
深度学习
计算机视觉
自然语言处理
Image processing
Image captioning
Deep learning
Computer vision
Natural language processing