摘要
针对传统公共环境图像描述模型中编码器—解码器结构在编码过程中特征提取能力不足以及解码过程中上下文信息丢失严重的问题,提出了一种基于Se-ResNet50与M-LSTM的公共环境图像描述模型。将SeNet模块添加到ResNet-50的残差路径中得到改进残差网络提取图像特征,SeNet对特征的各个部分赋予权重生成不同的注意力特征图,再融合文本特征向量输入具有额外门控运算的改进长短期记忆网络(M-LSTM)训练。模型训练结束后,输入公共环境图像就能得到描述图像内容的自然语句。该模型在多种数据集上进行了评估,实验结果表明,提出的模型在MSCOCO数据集上相较传统模型,在BLEU-1、BLEU-2、BLEU-3、BLEU-4、METEOR、CIDEr等评价指标上分别提高了3.2%、2.1%、1.7%、1.7%、1.3%、8.2%,证明了提出的方法在评价指标、语义多样性上具有一定的优越性。
Aiming at the problem that the encoder-decoder structure in the traditional public environment image description model has insufficient feature extraction ability in the encoding process and the serious loss of context information in the decoding process,this paper proposed a public environment image caption model based on Se-ResNet-50 and M-LSTM.It added the SeNet module to the residual path of ResNet-50 to obtain the improved residual network to extract image features,and weighted each part of the feature to generate different attention feature maps.It input the fused text feature vector to the improved and long short-term memory network(M-LSTM)training with additional gating operations.After the model training,input the public environment image to get the natural sentence describing the image content.It evaluated the model on a variety of datasets.The expe-rimental results show that the proposed model has improved by 3.2%,2.1%,1.7%,1.7%,1.3%,8.2%on BLEU-1,BLEU-2,BLEU-3,BLEU-4,METER,CIDEr and other evaluation indicators respectively compared with the traditional model on MSCOCO datasets,which proves that the method has certain advantages in evaluation indicators and semantic diversity.
作者
唐渔
何志琴
周宇辉
吴钦木
王霄
Tang Yu;He Zhiqin;Zhou Yuhui;Wu Qinmu;Wang Xiao(Electrical Engineering College,Guizhou University,Guiyang 550025,China)
出处
《计算机应用研究》
CSCD
北大核心
2023年第6期1864-1869,共6页
Application Research of Computers
基金
贵州省科学技术基金资助项目(黔科合支撑[2021]一般264)
贵州省科学技术基金资助项目(黔科合支撑[2021]一般442)。