Adequate alignment and interaction for cross-modal retrieval

下载PDF

导出

摘要 Background Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications,particularly image-text retrieval in the fields of computer vision and natural language processing.Recently,visual and semantic embedding(VSE)learning has shown promising improvements in image text retrieval tasks.Most existing VSE models employ two unrelated encoders to extract features and then use complex methods to contextualize and aggregate these features into holistic embeddings.Despite recent advances,existing approaches still suffer from two limitations:(1)without considering intermediate interactions and adequate alignment between different modalities,these models cannot guarantee the discriminative ability of representations;and(2)existing feature aggregators are susceptible to certain noisy regions,which may lead to unreasonable pooling coefficients and affect the quality of the final aggregated features.Methods To address these challenges,we propose a novel cross-modal retrieval model containing a well-designed alignment module and a novel multimodal fusion encoder that aims to learn the adequate alignment and interaction of aggregated features to effectively bridge the modality gap.Results Experiments on the Microsoft COCO and Flickr30k datasets demonstrated the superiority of our model over state-of-the-art methods.

作者 Mingkang WANG Min MENG Jigang LIU Jigang WU

机构地区 School of Computer Science and Technology Ping An Life Insurance of China

出处《Virtual Reality & Intelligent Hardware》 EI 2023年第6期509-522,共14页 虚拟现实与智能硬件（中英文）

基金 Supported by the National Natural Science Foundation of China (62172109,62072118) the National Science Foundation of Guangdong Province (2022A1515010322) the Guangdong Basic and Applied Basic Research Foundation (2021B1515120010) the Huangpu International Sci&Tech Cooperation foundation of Guangzhou (2021GH12)。

关键词 Cross-modal retrieval Visual semantic embedding Feature aggregation Transformer

分类号 TP391.3 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1刘春红,焦洁,王敬雄,李为丽,张俊娜.领域对抗自适应的短任务负载预测模型[J].计算机工程与应用,2023,59(24):289-297.
2Megan Hong,Gal Bitan.Recent advances and future therapy development for Alzheimer’s disease and related disorders[J].Neural Regeneration Research,2024,19(9):1877-1878.
3曾志贤,曹建军,翁年凤,袁震,余旭.Cross-Modal Entity Resolution for Image and Text Integrating Global and Fine-Grained Joint Attention Mechanism[J].Journal of Shanghai Jiaotong university(Science),2023,28(6):728-737.
4张世强,史卫亚,张绍文,王甜甜.基于改进YOLOv5算法的钢铁表面缺陷检测[J].科学技术与工程,2023,23(35):15148-15157. 被引量：2
5Yuqiao Zeng,Xu Wang,Hongwei Zhao,Yi Jin,George A.Giannopoulos,Yidong Li.Image fusion methods in high-speed railway scenes:A survey[J].High-Speed Railway,2023,1(2):87-91.
6Théodora M. Zohoncon,Joseph Sawadogo,Abdou Azaque Zoure,Abdoul Karim Ouattara,Marie N. L. Ouedraogo,Luc Zongo,Paul Ouedraogo,Florencia W. Djigma,Christelle W. M. Nadembèga,Raphael Kabore,Djénéba Ouermi,Dorcas Obiri-Yeboah,Jacques Simpore.Gene therapy for Parkinson’s Disease and Ethical Challenges: A Systematic Review[J].Advances in Parkinson's Disease,2023,12(2):9-28.
7Yumei Yue,Xiaodan Zhang,Wen Lv,Hsin-Yi Lai,Ting Shen.Interplay between the glymphatic system and neurotoxic proteins in Parkinson’s disease and related disorders:current knowledge and future directions[J].Neural Regeneration Research,2024,19(9):1973-1980. 被引量：1
8Peng Peng,Christophe Claramunt,Shifen Cheng,Yu Yang,Feng Lu.A multi-layer modelling approach for mining versatile ports of a global maritime transportation network[J].International Journal of Digital Earth,2023,16(1):2129-2151.
9Irene Kilanioti,George A.Papadopoulos.A Knowledge Graph-Based Deep Learning Framework for Efficient Content Similarity Search of Sustainable Development Goals Data[J].Data Intelligence,2023,5(3):663-684.
10Yanlin Wei,Xiaofeng Li,Lingjia Gu,Xingming Zheng,Tao Jiang.A novel fine-resolution snow depth retrieval model to revealdetailed spatiotemporal patterns of snow cover in NortheastChina[J].International Journal of Digital Earth,2023,16(1):1164-1185. 被引量：2

Virtual Reality & Intelligent Hardware

2023年第6期

浏览历史

内容加载中请稍等...

Adequate alignment and interaction for cross-modal retrieval

相关作者

相关机构

相关主题

浏览历史