摘要
文本-图像行人检索旨在从行人数据库中查找符合特定文本描述的行人图像.近年来受到学术界和工业界的广泛关注.该任务同时面临两个挑战:细粒度检索以及图像与文本之间的异构鸿沟.部分方法提出使用有监督属性学习提取属性相关特征,在细粒度上关联图像和文本.然而属性标签难以获取,导致这类方法在实践中表现不佳.如何在没有属性标注的情况下提取属性相关特征,建立细粒度的跨模态语义关联成为亟待解决的关键问题.为解决这个问题,融合预训练技术提出基于虚拟属性学习的文本-图像行人检索方法,通过无监督属性学习建立细粒度的跨模态语义关联.第一,基于行人属性的不变性和跨模态语义一致性提出语义引导的属性解耦方法,所提方法利用行人的身份标签作为监督信号引导模型解耦属性相关特征.第二,基于属性之间的关联构建语义图提出基于语义推理的特征学习模块,所提模块通过图模型在属性之间交换信息增强特征的跨模态识别能力.在公开的文本-图像行人检索数据集CUHK-PEDES和跨模态检索数据集Flickr30k上与现有方法进行实验对比,实验结果表明了所提方法的有效性.
The text-based person search aims to find the image of the target person conforming to a given text description from a person database,which has attracted the attention of researchers from academia and industry.It faces two challenges:fine-grained retrieval and a heterogeneous gap between images and texts.Some methods propose to use supervised attribute learning to obtain attribute-related features and build fine-grained associations between tests and images.The attribute annotations,however,are hard to obtain,which leads to poor performance of these methods in practice.Determining how to extract attribute-related features without attribute annotations and establish fine-grained and cross-modal semantic associations becomes a key problem to be solved.To address this issue,this study incorporates the pre-training technology and proposes a text-based person search via virtual attribute learning,which builds the cross-modal semantic associations between images and texts at a fine-grained level through unsupervised attribute learning.Specifically,in view of the invariance and cross-modal consistency of pedestrian attributes,a semantics-guided attribute decoupling method is proposed,which utilizes identity labels as the supervision signal to guide the model to decouple attribute-related features.Then,a feature learning module based on semantic reasoning is presented,which utilizes the relations between attributes to construct a semantic graph.This model uses the graph model to exchange information among attributes to enhance the cross-modal identification ability of features.The proposed approach is compared with existing methods on the public text-based person search dataset CUHK-PEDES and cross-modal retrieval dataset Flickr30k,and the experimental results verify the effectiveness of the proposed approach.
作者
王成济
苏家威
罗志明
曹冬林
林耀进
李绍滋
WANG Cheng-Ji;SU Jia-Wei;LUO Zhi-Ming;CAO Dong-Lin;LIN Yao-Jin;LI Shao-Zi(School of Informatics,Xiamen University,Xiamen 361005,China;School of Computer Science,Minnan Normal University,Zhangzhou 363000,China;Key Laboratory of Data Science and Intelligence Application(Minnan Normal University),Zhangzhou 363000,China)
出处
《软件学报》
EI
CSCD
北大核心
2023年第5期2035-2050,共16页
Journal of Software
基金
国家自然科学基金(61876159,62076210,62076116)。
关键词
行人检索
跨模态
属性学习
预训练
person search
cross-modality
attribute learning
pre-training