期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval
1
作者 Xue-Yang Qin Li-Shuang Li +3 位作者 jing-yao tang Fei Hao Mei-Ling Ge Guang-Yao Pang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2024年第4期811-826,共16页
Image-text retrieval aims to capture the semantic correspondence between images and texts,which serves as a foundation and crucial component in multi-modal recommendations,search systems,and online shopping.Existing m... Image-text retrieval aims to capture the semantic correspondence between images and texts,which serves as a foundation and crucial component in multi-modal recommendations,search systems,and online shopping.Existing mainstream methods primarily focus on modeling the association of image-text pairs while neglecting the advantageous impact of multi-task learning on image-text retrieval.To this end,a multi-task visual semantic embedding network(MVSEN)is proposed for image-text retrieval.Specifically,we design two auxiliary tasks,including text-text matching and multi-label classification,for semantic constraints to improve the generalization and robustness of visual semantic embedding from a training perspective.Besides,we present an intra-and inter-modality interaction scheme to learn discriminative visual and textual feature representations by facilitating information flow within and between modalities.Subsequently,we utilize multi-layer graph convolutional networks in a cascading manner to infer the correlation of image-text pairs.Experimental results show that MVSEN outperforms state-of-the-art methods on two publicly available datasets,Flickr30K and MSCOCO,with rSum improvements of 8.2%and 3.0%,respectively. 展开更多
关键词 image-text retrieval cross-modal retrieval multi-task learning graph convolutional network
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部