摘要
针对现有基于注意机制的图像描述方法全局信息缺失问题,提出了一种改进的全局注意机制图像描述方法。该方法在注意机制的基础上,通过设计全局特征网络来模拟人类感知机制的全过程,对图像全局特征进行增强。将所提方法在相同数据集和网络超参数的情况下与目前最优网络进行实验对比,分析了全局信息对生成文本的影响。实验结果显示,文中提出的方法在更具挑战性的中文文本描述任务上客观评价指标优于目前最优的模型。同时,在主观评价中能够生成更准确的文本内容,也更具丰富性与多样性,接近自然语言描述。
Aiming at the lack of global information in existing attention based image caption methods,we propose an improved image caption method with global attention mechanism.The proposed method mimics the entire human perception process via designing aglobal feature extraction network to enhance the global context based on visual attention mechanism.This paper compares the proposed method with the existing attention based image caption technique under the same dataset and hyper parameters,and analyzes the influence of introducing the global feature.The results show that our method outperforms the existing technique in objective evaluations with the challenging Chinese caption dataset.In the subjective evaluation,in the meanwhile,the captions generated by the proposed method describes the image more accurately,vividly and diversely,and they are more close to the natural language.
作者
马书磊
张国宾
焦阳
石光明
MA Shulei;ZHANG Guobin;JIAO Yang;SHI Guangming(School of Artificial Intelligence, Xidian Univ., Xi'an 710071, China;The 27 th Research Institute of China Electronic Technology Group Corporation, Zhengzhou 450047, China)
出处
《西安电子科技大学学报》
EI
CAS
CSCD
北大核心
2019年第2期17-22,共6页
Journal of Xidian University
基金
国家自然科学基金(61875157
61301288)
关键词
图像描述
注意力机制
全局特征
卷积神经网络
循环神经网络
image caption
attention mechanism
global feature
convolutional neural network
recurrent neural network