期刊文献+

结合视觉注意机制与递归神经网络的图像检索 被引量:7

Image retrieval by combining recurrent neural network and visual attention mechanism
原文传递
导出
摘要 目的图像检索是计算机视觉的一项重要任务。图像检索的关键是图像的内容描述,复杂图像的内容描述很具有挑战性。传统的方法用固定长度的向量描述图像内容,为此提出一种变长序列描述模型,目的是丰富特征编码的信息表达能力,提高检索精度。方法本文提出序列描述模型,用可变长度特征序列描述图像。序列描述模型首先用CNN(convolutional neural network)提取底层特征,然后用中间层LSTM(long short-term memory)产生局部特征的相关性表示,最后用视觉注意LSTM(attention LSTM)产生一组向量描述一幅图像。通过匈牙利算法计算图像之间的相似性完成图像检索任务。模型采用标签级别的triplet loss函数进行端对端的训练。结果在MIRFLICKR-25K和NUS-WIDE数据集上进行图像检索实验,并和相关算法进行比较。相对于其他方法,本文模型检索精度提高了5 12个百分点。相对于定长的图像描述方式,本文模型在多标签数据集上能够显著改善检索效果。结论本文提出了新的图像序列描述模型,可以显著改善检索效果,适用于多标签图像的检索任务。 Objective Image retrieval is an important task in computer vision. Image content description is the key to image retrieval. Accurate and full descriptions of the image content can significantly improve retrieval precision. Traditional methods describe image content by a unified fixed-length vector. A simple image only contains one object, whereas a complex image can contain several objects. Describing a complex image similar to a simple image by a fixed-length vector is generally insufficient. This study proposes a varying-length sequence description model. Method We propose the sequence description model based on the Recurrent Neural Network and Visual Attention Mechanism. The sequence description model describes images with varying-length sequences. The sequence description model first extracts low-level features by CNN ( convolutional neural network), then generates a contextual representation of local features by intermediate LSTM ( long short-term memory), and finally produces a vector group to describe an image by attention LSTM. The attention mechanism enables the vector number to describe images that are as many as the label number of the described image. The model is end-to-end trainable, and we train the sequence description model with label-level triplet loss function. We apply the Hungarian algorithm to compute the similarities between the two images. We also study the image retrieval precision with different deep multilayer LSTMs by changing the number of multilayer LSTMs. Result We performed the experiment based on two common datasets: MIRFLICKR-25K and NUS-WIDE. Our sequence description model method increased by 10 percent to 12 percent in terms of accuracy rate, unlike the DNN-lai method in the single-label image retrieval experiment on the MIRFLICKR-25K dataset. Our sequence description model method increased by approximately 10 percent over the CCAITQ and DSRH methods in the experiment on muhi-label image retrieval on the NUS-WIDE dataset. We also prnvided comparative results of the performance of our method against the DNN-Iai method. We applied the Hungarian algorithm to compute the similarities between two images, which consumed much time, given that our feature extraction results are varying-length. Thus, our method required a long time when querying an image in the dataset. Conclusion This study presented a model utilizing a recun'ent neural network to generate descriptive sequences of an image with attention LSTM. The proposed model was applicable to the task of multi-label image retrieval.
出处 《中国图象图形学报》 CSCD 北大核心 2017年第2期241-248,共8页 Journal of Image and Graphics
基金 国家自然科学基金项目(U1435219)~~
关键词 图像检索 序列描述模型 特征提取 匈牙利算法 卷积神经网络 LSTM image retrieval sequences description model feature extraction Hungarian algorithm convolutional neural nelwork (CNN) long short-term memory ( LSTM )
  • 相关文献

同被引文献64

引证文献7

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部