期刊文献+

同步融合视觉与语义信息的图像描述模型

Image caption model with synchronous fusion of visual and semantic information
下载PDF
导出
摘要 针对现有图像描述方法将视觉和语义信息单独处理、缺乏结构化信息和忽略全局信息的问题,提出一种同步融合视觉与语义信息的图像描述模型(SG-sMLSTM)。通过融合图像全局特征和候选区域的多模态特征增强和细化图像视觉信息,基于场景图实现结构化语义信息编码;解码部分设计sMLSTM结构,利用注意机制同步动态融合视觉和语义信息,使模型在每个时间步接收更全面的信息,自适应选择关注更关键的区域。基于MSCOCO数据集的实验结果表明,该模型能够产生更准确的描述语句,在评价指标得分上与基线方法相比有约3%的提升。 Aiming at the problems that the existing image caption generation methods treat visual and semantic information indivi-dually,lack structured information and ignore the global information of the image,an image caption model(SG-sMLSTM)was proposed,which integrated visual and semantic information synchronously.The visual information was enhanced and refined by fusing the global feature and the multimodal features of candidate regions in the image,and the structured semantic features coding was realized according to the scene graph.The sMLSTM structure was designed in the decoding part,which used attention mechanism to fuse visual and semantic information synchronously and dynamically,so that the model received more comprehensive information at each time step,thus adaptively choosing more critical region to pay attention to.Experimental results on MSCOCO dataset show that the model can produce more accurate captions,and the evaluation index score is about 3%higher than that of the baseline method.
作者 彭玉青 裴一心 王晨曦 贾亚敏 PENG Yu-qing;PEI Yi-xin;WANG Chen-xi;JIA Ya-min(School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China)
出处 《计算机工程与设计》 北大核心 2023年第3期807-814,共8页 Computer Engineering and Design
基金 河北省自然科学基金项目(F2021202038)。
关键词 图像描述 场景图 多模态 视觉信息 语义信息 注意机制 同步融合 image caption scene graph multimodal visual information semantic information attention mechanism synchronous fusion
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部