摘要
针对现有基于视觉注意力和基于文本注意力的图像描述自动生成模型无法同时兼顾描述图像细节和整体图像的问题,提出了一种基于演化深度学习的图像描述生成模型(evolutionary deep learning model for image captioning, EDLMIC),该模型是一种包含图像编码器、演化神经网络和自适应融合解码器三个子模块的图像描述自动生成模型,能够有效地融合视觉信息和文本信息,自动计算这两种信息在每个时间步所占的比例,从而基于融合的视觉文本信息更好地生成给定图像的相关描述。在Flickr30K和COCO2014两个公开数据集的实验结果表明,EDLMIC模型在METEOR、ROUGE-L、CIDEr和SPICE四个指标均优于其他基线模型,并且在多种不同的生活场景中具有较好的性能。
Aiming at the problem that the existing automatic image description generation models based on visual attention and text attention cannot describe the image details and the whole image at the same time, this paper proposed a model for image captioning which included three sub-modules, i.e.,an image encoder, an evolutionary neural network, and an adaptive merging decoder.The proposed model could effectively integrate both the visual information and text information, and automatically calculated the proportion of these two information at each time step.The experimental results on two public data sets, Flickr30 k and COCO2014,show that the proposed EDLMIC model is superior to other baseline models in four indicators such as METEOR,ROUGE-L,CIDEr and SPICE,and has good performance in a variety of different life scenes.
作者
高欣
孙茂圣
朱俊武
Gao Xin;Sun Maosheng;Zhu Junwu(School of Information Engineering,Jiangsu College of Tourism,Yangzhou Jiangsu 225131,China;College of Information Enginee-ring,Yangzhou University,Yangzhou Jiangsu 225127,China;Office of Informationization Construction&Administration,Yangzhou University,Yangzhou Jiangsu 225127,China)
出处
《计算机应用研究》
CSCD
北大核心
2022年第3期911-918,共8页
Application Research of Computers
基金
江苏省高职院校教师专业带头人高端研修项目
国家自然科学基金资助项目(61872313)
江苏省教育信息化研究重点课题(20180012)
扬州市科技计划资助项目(YZ2019133,YZ2020174)。
关键词
演化深度学习
图像描述生成
注意力机制
计算机视觉
自然语言处理
evolutionary deep learning
image captioning
attention mechanism
computer version
national language processing