一种基于注意力机制与多模态的图像描述方法

An Image Description Method Based on Attention and Multimodality

下载PDF

导出

摘要最近几年在深度学习领域中,自动生成一副图像的自然语言描述引发了学界的广泛关注,原因是图像描述在实际应用中的重要性以及它连接了两个重要的人工智能领域:计算机视觉和自然语言处理.以往的模型大多采用基于模板或简单的编码-解码方式,生成的文本结构较为单一并且不能够根据图像中各个物体的相互关系表达出图像的深层意义.提出了一种基于注意力机制与多模态的图像描述方法,在LSTM(Long-Short Term Memory)的基础上改进了Attention机制,并在Attention结构后面添加了多模态层对图像的上下文特征信息以及LSTM的隐层状态进行融合处理.在两个公共数据集:MS COCO以及Flickr 30K上进行验证,实验结果证明所提方法有效且可以使生成的描述语句更加丰富. In recent years,in the field of deep learning an image generates the natural language description automatically generating an image has attracted widespread attention because of the importance of image description in practical applications and its connection to two important areas of artificial intelligence:computers Visual and natural language processing.Most of the previous models use template-based or simple encoding-decoding methods.The generated text structure is relatively simple and cannot express the deep meaning of the image according to the relationship between the objects in the image.In this paper,we propose an method of image description based on attention mechanism and multi-modality.The Attention mechanism is improved based on LSTM (Long-Short Term Memory),and a multi-modal layer is added after an attention structure.The contextual feature information and the hidden layer state of the LSTM are fused at the same time they are processed by the multimodality together.This paper verifies on two public data sets:MS COCO and Flickr 30K.The experimental results show that the proposed method is effective and can make the generated description statements more abundant.

作者牛斌李金泽房超马利徐和然纪兴海 NIU Bin;LI Jin-ze;FANG Chao;MA Li;XU He-ran;JI Xing-hai(College of Information,Liaoning University,Shenyang 110036,China;College of Information,Bohai University,Jinzhou 121001,China;65735 Troops of PLA,Shenyang 118005,China)

机构地区辽宁大学信息学院渤海大学信息学院中国人民解放军

出处《辽宁大学学报（自然科学版）》 CAS 2019年第1期38-45,共8页 Journal of Liaoning University：Natural Sciences Edition

基金辽宁省科技厅博士科研启动基金指导计划项目(20170520276)

关键词图像描述注意力机制 LSTM 多模态 image description attention LSTM multimodality

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1尉根强.立足概念教学培育核心素养——以“函数的单调性”教学设计为例[J].福建中学数学,2019(1):20-24. 被引量：2
2张海鹏,杨宏业,邬鑫珏,王葆元.基于公交车GPS数据的短时交通流预测研究[J].内蒙古工业大学学报（自然科学版）,2018,37(1):75-80. 被引量：3
3何现启,张清,彭凌星,王焱.长短桩群桩承台沉降灰色预测[J].土木工程,2018,7(4):580-588.
4马苗,王伯龙,吴琦,武杰,郭敏.视觉场景描述及其效果评价[J].软件学报,2019,30(4):867-883. 被引量：5
5马书磊,张国宾,焦阳,石光明.一种改进的全局注意机制图像描述方法[J].西安电子科技大学学报,2019,46(2):17-22. 被引量：6
6张梦杰.中国志怪小说中人与异类之关系——基于《聊斋志异》涉病作品的人类学考察[J].安顺学院学报,2019,21(1):36-40. 被引量：1
7张凯,李军辉,周国栋.基于枢轴语言的图像描述生成研究[J].中文信息学报,2019,33(3):110-117. 被引量：3
8刘卓非,廖强.面向目标跟踪的无线传感器网络设计[J].信息通信,2019,32(2):173-174.
9王立,张谧.基于LSTM的POI个性化推荐框架[J].计算机系统应用,2018,27(12):56-61. 被引量：6
10高木.美术馆之夜 “沉浸式体验”20世纪初期中国都市的审美风尚[J].东方文化周刊,2018(19):72-77.

辽宁大学学报（自然科学版）

2019年第1期

浏览历史

内容加载中请稍等...

一种基于注意力机制与多模态的图像描述方法

相关作者

相关机构

相关主题

浏览历史