摘要
针对基于卷积神经网络(convolutional neural network,CNN)和长短期记忆网络(long short-term memory,LSTM)的方法存在计算复杂度高、收敛速度慢、训练时间长等问题,本文提出基于GoogLeNet和双层GRU的图像描述模型,在训练阶段采用适应性动量估计法(adaptive moment estimation,Adam)优化算法,加快了整体模型的收敛速率,提高了模型性能。在MSCOCO和Flickr30K两个数据集上的实验结果表明,基于GoogLeNet和双层GRU的图像描述模型实验效果优于目前常用的图像描述模型,生成的句子准确度更高,在多个评价指标上超过了其他常用图像描述模型。
The method based on CNN and LSTM is currently the mainstream method of image description.Although this method has made great progress in image description,it still has problems such as high computational complexity,slow convergence speed and long training time.To solve these problems,an image description model based on GoogLeNet and double-layer GRU is proposed.The Adam optimization algorithm is used in the training stage to accelerate the overall model′s convergence rate and improve the model performance.Experimental results on the two datasets of MSCOCO and Flickr30K show that the image description model based on GoogLeNet and double-layer GRU has better experimental results than the commonly used image description model.The generated sentence is more accurate and exceeds other currently used image description models in multiple evaluation indicators.
作者
张洁庆
郭敏
肖冰
ZHANG Jieqing;GUO Min;XIAO Bing(School of Computer Science,Shaanxi Normal University,Xi′an 710119,Shaanxi,China)
出处
《陕西师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2021年第1期68-73,共6页
Journal of Shaanxi Normal University:Natural Science Edition
基金
国家自然科学基金(61401265)。