期刊文献+

基于Bert词向量与有序记忆网络的图像描述

Image Caption Based on Bert Word Vectors and Ordered Memory Network
下载PDF
导出
摘要 目前,基于编码器—解码器框架图像描述模型在编码阶段未考虑同一个单词在不同语句中的差异,在解码阶段未考虑语言序列的层级结构。为解决该问题,基于深度学习对图像描述进行研究,充分利用图像的视觉特征与参考语句的文本信息,设计了基于Bert词向量和有序记忆网络的图像描述模型。该模型采用编码器—解码器框架,编码器负责获取图像和参考文本的信息,解码器负责输出预测的文本。编码器利用Inception-v4网络与通道注意力和空间注意力机制(CBAM)相结合的方式获取图像特征,利用Bert模型对参考文本进行向量化获取参考文本的信息。将获取到的视觉特征和文本信息输入到解码器中,使用具有良好决策能力的策略网络与价值网络为解码器提供指导,通过与自适应注意力相结合的有序记忆网络(ON-LSTM)生成最终的图像描述语句。该模型在MS COCO Caption2014数据集中相对于基础模型在BLEU-1、BLEU-4、CIDEr和Meteor上分别提高了0.7%、1.1%、0.6%和0.7%,是一种有效的图像描述模型。 At present,the image description model based on the coder-decoder framework does not consider the difference of the same word in different sentences at the encoding stage,and does not consider the hierarchical structure of the language sequence at the decoding stage.To solve this problem,the image description is studied based on deep learning,and the image description model based on Bert word vector and ordered memory network is designed by making full use of the visual features of the image and the text information of the reference sentences.The model adopts the coder-decoder framework.The encoder is responsible for obtaining the information of image and reference text,and the decoder is responsible for outputting the predicted text.The encoder uses the combination of Perception-v4 network and channel attention and spatial attention mechanism(CBAM)to obtain image features,and uses Bert model to vectorize the reference text to obtain the information of the reference text.Input the acquired visual features and text information into the decoder,use the strategy network and value network with good decision-making ability to provide guidance for the decoder,and generate the final image description sentence through the ordered memory network(ON-LSTM)combined with adaptive attention.This model is an effective image description model with an increase of 0.7%,1.1%,0.6% and 0.7% respectively on BLEU-1,BLEU-4,CIDEr and Meteor in the MS COCO Caption 2014 dataset compared with the basic model.
作者 俞艺文 施水才 王洪俊 YU Yi-wen;SHI Shui-cai;WANG Hong-jun(School of Computer Science,Beijing Information Science and Technology University,Beijing 100192,China;TRS Information Technology Co.,Ltd,Beijing 100101,China)
出处 《软件导刊》 2023年第3期125-133,共9页 Software Guide
关键词 Bert 有序记忆网络 图像描述 深度学习 Bert ordered memory networks image caption deep learning
  • 相关文献

参考文献2

二级参考文献5

共引文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部