期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
LSTM-in-LSTM for generating long descriptions of images
1
作者 Jun Song Siliang Tang +2 位作者 Jun Xiao Fei Wu zhongfei(mark)zhang 《Computational Visual Media》 2016年第4期379-388,共10页
In this paper, we propose an approach for generating rich fine-grained textual descriptions of images. In particular, we use an LSTM-in-LSTM(long short-term memory) architecture, which consists of an inner LSTM and an... In this paper, we propose an approach for generating rich fine-grained textual descriptions of images. In particular, we use an LSTM-in-LSTM(long short-term memory) architecture, which consists of an inner LSTM and an outer LSTM. The inner LSTM effectively encodes the long-range implicit contextual interaction between visual cues(i.e., the spatiallyconcurrent visual objects), while the outer LSTM generally captures the explicit multi-modal relationship between sentences and images(i.e., the correspondence of sentences and images). This architecture is capable of producing a long description by predicting one word at every time step conditioned on the previously generated word, a hidden vector(via the outer LSTM),and a context vector of fine-grained visual cues(via the inner LSTM). Our model outperforms state-of-theart methods on several benchmark datasets(Flickr8k,Flickr30 k, MSCOCO) when used to generate long rich fine-grained descriptions of given images in terms of four different metrics(BLEU, CIDEr, ROUGE-L, and METEOR). 展开更多
关键词 long short-term memory(LSTM) image description generation computer vision neural network
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部