期刊文献+

基于自监督学习的文本行人检索 被引量:1

A Self-Supervised Learning Approach for Text-Based Person Search
下载PDF
导出
摘要 基于文本的行人检索任务旨在以文本为查询在大规模数据库中检索出目标行人的图像,在处理社会公共安全问题中具有较高的实用价值.不同于常规的跨模态检索任务,该任务中所有的类别都是行人,不同行人之间的外观差异较小,难以辨别;此外由于拍摄条件的限制图像质量通常较差,因此如何有效地提取更鲁棒、更具有判别性的视觉特征是该任务的一项重要挑战.为了应对这一挑战,设计了一种基于自监督学习的文本行人检索算法,以多任务学习的形式将自监督学习与基于文本的行人检索任务相结合,对两种任务同时进行训练,共享模型参数.其中,自监督任务作为辅助任务,旨在为行人检索任务学习到更鲁棒、更具有判别性的视觉特征.具体来说,首先提取视觉和文本特征,并以图像修复作为自监督任务,旨在学习更丰富的语义信息,且对遮挡数据具有更好的鲁棒性;基于行人图像的特殊性,进一步设计了一种镜像翻转预测任务,通过训练网络预测图像是否经过了镜像翻转学习具有判别性的细节信息,以帮助行人检索任务更好地区分难分样本.在公开数据集上进行的大量实验证明了该算法的先进性和有效性,将行人检索的Top-1准确率提升了2.77%,并且实验结果显示两种自监督任务存在一定的互补性,同时使用可以实现更好的检索性能. The text-based person search task aims at retrieving images of target pedestrians in a large-scale database with text as a query,which is highly practical in social and public safety.In contrast with the conventional crossmodal retrieval task,all categories in this task are pedestrians.However,the slight appearance difference among different pedestrians makes it difficult to discriminate,and poor shooting conditions cause the production of bad image quality.Therefore,the effective extraction of robust and discriminative visual features is an important challenge to this task.In response,a text-based person search algorithm based on self-supervised learning was designed,which formulated the self-supervised learning and text-based person search task in the form of multitask learning.Both tasks were trained at the same time and shared similar model parameters.As an auxiliary task,the self-supervised task aims to learn more robust and discriminative visual features for the person search task.Specifically,visual and textual features were first extracted,and the image inpainting was applied as a self-supervised task,aiming to learn richer semantic information and become more robust to occlusion data.Based on the particularity of the person image,a mirror flip prediction task was further designed to learn discriminative details by training the network to predict whether the image was mirror-flipped or not.This was applied to enable the person search task to distinguish difficult samples.Extensive experiments on the public dataset have demonstrated the superiority and effectiveness of the proposed approach,thereby improving the Top-1 accuracy of person search by 2.77%.Experimental results also show that the two self-supervised tasks are complementary,and better retrieval performance can be achieved using them at the same time.
作者 冀中 胡峻华 丁学文 李晟嘉 Ji Zhong;Hu Junhua;Ding Xuewen;Li Shengjia(School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China;School of Electronic Engineering,Tianjin University of Technology and Education,Tianjin 300222,China;R&D Department,China Academy of Launch Vehicle Technology,Beijing 100076,China)
出处 《天津大学学报(自然科学与工程技术版)》 EI CAS CSCD 北大核心 2023年第2期169-176,共8页 Journal of Tianjin University:Science and Technology
基金 天津市自然科学基金资助项目(19JCYBJC16000) 国家自然科学基金资助项目(62176178) 天津市科委科技特派员资助项目(20YDTPJC01110) 中国航天科技集团公司钱学森青年创新基金资助项目。
关键词 行人检索 跨模态分析 自监督学习 多任务学习 person search crossmodal analysis self-supervised learning multitask learning
  • 相关文献

参考文献1

共引文献2

同被引文献3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部