摘要
为解决传统长短时记忆(LSTM)神经网络存在过早饱和的问题,使得对给定的图片能够生成更准确的描述,提出一种基于反正切函数的长短时记忆(ITLSTM)神经网络模型。首先,利用经典的卷积神经网络模型提取图像特征;然后,利用ITLSTM神经网络模型来表征图像对应的描述;最后在Flickr8K数据集上评估模型的性能,并与几种经典的图像标题生成模型如Google NIC等进行比较,实验结果表明本文提出的模型能够有效地提高图像标题生成的准确性。
In order to solve the problem of premature saturation of traditional Long Short-Term Memory(LSTM)neural network and generate a more accurate description for a given picture,this paper proposes a long short-term memory neural network model based on inverse tangent function(ITLSTM).Firstly,the classical convolutional neural network model is used to extract image features.Then,the ITLSTM neural network model is used to characterize the corresponding description of the image.Finally,the performance of the model is evaluated on the Flickr8K dataset and compared with several classic image caption generation models such as Google NIC.The experimental results show that the proposed model can effectively improve the accuracy of image caption generation.
作者
王志平
郑宝友
刘仪伟
WANG Zhi-ping;ZHENG Bao-you;LIU Yi-wei(College of Science,Dalian Maritime University,Dalian 116000,China)
出处
《计算机与现代化》
2020年第4期37-41,共5页
Computer and Modernization
基金
中央高校基础研究基金资助项目(3132019323)。
关键词
图像标题生成
反正切函数
长短时记忆神经网络
卷积神经网络
image caption generation
inverse tangent function
Long Short-Term Memory(LSTM)neural network
convolutional neural network