摘要
针对当前图像描述任务中,生成描述图像的语句整体质量不高的问题,提出一种融合word2vec和注意力机制的图像描述模型。在编码阶段,应用word2vec模型描述文本向量化操作,以增强词与词的相关性;应用VGGNet19网络提取图像特征,并在图像特征中融合注意力机制,使得模型在每一个时间节点上生成单词时能够突出相对应的图像特征。在解码阶段,应用GRU网络作为图像描述任务的语言生成模型,用以提高模型的训练效率和生成句子的质量。在Flickr8k和Flickr30k两个公共数据集上的实验结果表明,在同一训练环境下,GRU模型的训练时长比LSTM模型节省了1/3的时间,在BLEU和METEOR评价标准上,所提模型的性能得到了显著提升。
For the overall quality of the sentence describing the generated image is not high in the current image description task,and an image description model fusing word2vec and attention mechanism was proposed. In the encoding stage,the word2vec model is used to describe the text vectorization operations to enhance the relationship among words.The VGGNet19 network is utilized to extract image features,and the attention mechanism is integrated in the image features,so that the corresponding image features can be highlighted when the words are generated at each time node.In the decoding stage,the GRU network is used as a language generation model for image description tasks to improve the efficiency of model training and the quality of generated sentences.Experimental results on Flickr8k and Flickr30k data sets show that under the same training environment,the GRU model saves 1/3 training time compared to the LSTM model.In the BLEU and METEOR evaluation standards,the performance of the proposed model in this paper is significantly improved.
作者
邓珍荣
张宝军
蒋周琴
黄文明
DENG Zhen-rong;ZHANG Bao-jun;JIANG Zhou-qin;HUANG Wen-ming(School of Computer and Information Security,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China;Guangxi Colleges and Universities Keys Laboratory of cloud Computing and Complex Systems,Guilin,Guangxi 541004,China)
出处
《计算机科学》
CSCD
北大核心
2019年第4期268-273,共6页
Computer Science
基金
广西高校云计算与复杂系统重点实验室项目(yf17106)
广西自然科学基金(2018GXNSFAA138132)
桂林电子科技大学研究生创新项目(2018YJCX55)资助