期刊文献+

图文跨模态检索的联合特征方法

Joint feature approach for image-text cross-modal retrieval
下载PDF
导出
摘要 随着深度学习的快速发展,图文跨模态检索性能有了显著提升。然而现有方法仅利用全局信息对图像文本整体匹配或仅利用局部信息匹配,对图文信息的利用存在局限性,检索性能有待进一步提升。为了充分挖掘图像与文本语义的潜在联系,提出了一种基于联合特征的跨模态检索模型,其特征提取部分由两级网络分别处理图像与文本的局部特征和全局特征。并且在全局特征优化的过程中,设计了基于注意力机制的双线性层结构来过滤冗余信息,减小与局部特征的精细度差距。同时为实现两类特征联合优化,在损失函数部分使用三元组排名损失获取不同模态间的联系,并引入语义标签分类损失保持全局语义一致性。所提出的模型具有广泛的通用性,可以有效提升仅基于局部信息模型的性能。在公开数据集Flickr30k和MS COCO上一系列的实验结果表明,提出的模型有效地提升了跨模态图文检索任务的性能,在Flickr30k数据集检索任务中提出的模型在文本检索的R@1指标上提高了约5.1%,在图像检索的R@1指标上提高了约2.8%。 With the rapid development of deep learning,cross-modal retrieval performance has been significantly improved.However,existing methods only match the image text as a whole or only use local information for matching,there are limitations in the use of graphic and textual information,and the retrieval performance needs to be further improved.In order to fully exploit the potential semantic relationship between images and texts,this paper proposes a cross-modal retrieval model based on joint features.In the feature extraction part,two sub-networks are used to deal with the local features and global features of images and texts respectively,and a bilinear layer structure based on the attention mechanism is designed to filter redundant information.In the loss function part,the triplet ranking loss and semantic label classification loss are used to realize feature joint optimization.And the proposed model has a wide range of generality,which can effectively improve the performance of the model only based on local information.A series of experimental results on the public datasets Flickr30k and MS COCO show that the proposed model effectively improves the performance of cross-modal image-text retrieval tasks.In the Flickr30k dataset retrieval task,the proposed model improves 5.1%on the R@1 metric for text retrieval and 2.8%on the R@1 metric for image retrieval.
作者 高迪辉 盛立杰 许小冬 苗启广 GAO Dihui;SHENG Lijie;XU Xiaodong;MIAO Qiguang(School of Computer Science and Technology,Xidian University,Xi’an 710071,China;Key Laboratory of Big Data and Intelligent Vision,Xidian University,Xi’an 710071,China)
出处 《西安电子科技大学学报》 EI CAS CSCD 北大核心 2024年第4期128-138,共11页 Journal of Xidian University
基金 国家自然科学基金(62272364) 陕西高等继续教育教学改革研究课题(21XJZ004)。
关键词 跨模态检索 深度学习 自注意力网络 图像检索 cross-modal retrieval deep learning self-attention network image retrieval
  • 相关文献

参考文献1

二级参考文献1

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部