基于多尺度特征增强与对齐的跨模态行人检索

Cross-modal pedestrian retrieval based on multi-scale feature enhancement and alignment

下载PDF

导出

摘要为了解决跨模态行人检索从图像和文本中抽取有效的细节特征,以及实现图像与自然语言文本跨模态对齐的问题,提出一种基于多尺度特征增强与对齐的跨模态行人检索模型。该模型引入多模态预训练模型,并构建文本引导的图像掩码建模辅助任务,充分实现跨模态交互,从而无需显式地标注信息即可增强模型学习图像局部细节特征的能力。另外,针对行人图像身份易混淆问题,设计全局图像特征匹配辅助任务,引导模型学习身份关注的视觉特征。在CUHK-PEDES、ICFG-PEDES和RSTPReid等多个公开数据集上的实验结果表明,所提模型超越了目前已有的主流模型,其第一命中率分别达到了72.47%、62.71%和59.25%,实现了高准确率的跨模态行人检索。 In order to solve the problem of extracting effective detail features from images and texts in cross-modal pedestrian retrieval,as well as achieving cross-modal alignment between images and natural language texts,a cross-modal pedestrian retrieval model based on multi-scale feature enhancement and alignment is proposed.In this model,the multimodal pre-training model is introduced,and the text-guided image mask modeling auxiliary task is constructed to fully realize cross-modal interaction,so as to enhance the model's ability to learn local image detail features without explicit annotation information.In allusion to the identity confusion in person images,a global image feature matching auxiliary task is designed to guide the model to learn visual features that are relevant to identity.The experimental results on multiple public datasets such as CUHK-PEDES,ICFG-PEDES,and RSTPReid show that the proposed model surpasses existing mainstream models,with first hit rates of 72.47%,62.71%,and 59.25%,respectively,achieving high accuracy in cross-modal pedestrian retrieval.

作者徐领缪翌张卫锋 XU Ling;MIAO Yi;ZHANG Weifeng(School of Computer Science and Technology,Zhejiang Sci‐Tech University,Hangzhou 310018,China;School of Information Science and Engineering,Jiaxing University,Jiaxing 314001,China)

机构地区浙江理工大学计算机科学与技术学院嘉兴大学信息科学与工程学院

出处《现代电子技术》北大核心 2024年第22期44-50,共7页 Modern Electronics Technique

关键词跨模态行人检索多尺度特征增强多模态对齐 CLIP 图像掩码跨模态交互交叉注意力 cross modal pedestrian retrieval multi-scale feature enhancement multimodal alignment CLIP image mask cross-modal interaction cross attention

分类号 TN911-34 [电子电信—通信与信息系统] TP391.41 [自动化与计算机技术—计算机应用技术] TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1叶华,林莉,卢欣辰,董诗焘,陈飞.基于深度强化学习的电网异常告警决策[J].电力设备管理,2024(18):38-40.
2王紫馨,魏伯轩,昝涛.ChatGPT应用于整形外科教学的可行性探索[J].中华整形外科杂志,2024,40(9):1006-1011.
3郭志涛,袁萍修,胡景南,魏英杰.融合注意力机制的轻量化多尺度网络用于心电图多标签分类[J].河北工业大学学报,2024,53(5):76-86.
4高延海,刘永帅.基于级联注意力和边界预测改进的轻量Segformer语义分割[J].无线电工程,2024,54(11):2585-2593.
5宋云,付莉.基于去噪自编码器生成对抗网络的网络流量异常检测[J].电脑编程技巧与维护,2024(10):160-162.
6王帅炜,雷杰,冯尊磊,梁荣华.视觉表征学习综述[J].计算机科学,2024,51(11):112-132.
7贾冰,王军号.结合自监督对比与元迁移的小样本图像分类算法[J].兰州工业学院学报,2024,31(5):8-14.
8周孟然,王皓.基于改进YOLOv7的安全帽佩戴检测算法[J].软件,2024,45(8):14-17.
9麻玉清,柳海云,郭伟,江丽馨,孙世琪,王晓强,马全英,郭慧琛.基于响应面法优化脂质纳米颗粒的配方及表征[J].中国兽医科学,2024,54(10):1399-1407.
10刘宗昊,彭文杰,代港,黄双萍,刘永革.语义增强的零样本甲骨文字符识别[J].电子学报,2024,52(10):3347-3358.

现代电子技术

2024年第22期

浏览历史

内容加载中请稍等...

基于多尺度特征增强与对齐的跨模态行人检索

相关作者

相关机构

相关主题

浏览历史