Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared ima...Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared images for video surveillance,which poses a challenge in exploring cross-modal shared information accurately and efficiently.Therefore,multi-granularity feature learning methods have been applied in VI-ReID to extract potential multi-granularity semantic information related to pedestrian body structure attributes.However,existing research mainly uses traditional dual-stream fusion networks and overlooks the core of cross-modal learning networks,the fusion module.This paper introduces a novel network called the Augmented Deep Multi-Granularity Pose-Aware Feature Fusion Network(ADMPFF-Net),incorporating the Multi-Granularity Pose-Aware Feature Fusion(MPFF)module to generate discriminative representations.MPFF efficiently explores and learns global and local features with multi-level semantic information by inserting disentangling and duplicating blocks into the fusion module of the backbone network.ADMPFF-Net also provides a new perspective for designing multi-granularity learning networks.By incorporating the multi-granularity feature disentanglement(mGFD)and posture information segmentation(pIS)strategies,it extracts more representative features concerning body structure information.The Local Information Enhancement(LIE)module augments high-performance features in VI-ReID,and the multi-granularity joint loss supervises model training for objective feature learning.Experimental results on two public datasets show that ADMPFF-Net efficiently constructs pedestrian feature representations and enhances the accuracy of VI-ReID.展开更多
Visible-infrared person re-identification has attracted extensive attention from the community due to its potential great application prospects in video surveillance.There are huge modality discrepancies between visib...Visible-infrared person re-identification has attracted extensive attention from the community due to its potential great application prospects in video surveillance.There are huge modality discrepancies between visible and infrared images caused by different imaging mechanisms.Existing studies alleviate modality discrepancies by aligning modality distribution or extracting modality-shared features on the original image.However,they ignore a key solution,i.e.,converting visible images to gray images directly,which is efficient and effective to reduce modality discrepancies.In this paper,we transform the cross-modality person re-identification task from visible-infrared images to gray-infrared images,which is named as the minimal modality discrepancy.In addition,we propose a pyramid feature integration network(PFINet)which mines the discriminative refined features of pedestrian images and fuses high-level and semantically strong features to build a robust pedestrian representation.Specifically,PFINet first performs the feature extraction from concrete to abstract and the top-down semantic transfer to obtain multi-scale feature maps.Second,the multi-scale feature maps are inputted to the discriminative-region response module to emphasize the identity-discriminative regions by the spatial attention mechanism.Finally,the pedestrian representation is obtained by the feature integration.Extensive experiments demonstrate the effectiveness of PFINet which achieves the rank-1 accuracy of 81.95%and mAP of 74.49%on the multi-all evaluation mode of the SYSU-MM01 dataset.展开更多
跨模态行人重识别研究的重难点主要来自于行人图像之间巨大的模态差异和模态内差异。针对这些问题,提出一种结合多尺度特征与混淆学习的网络结构。为实现高效的特征提取、缩小模态内差异,将网络设计为多尺度特征互补的形式,分别学习行...跨模态行人重识别研究的重难点主要来自于行人图像之间巨大的模态差异和模态内差异。针对这些问题,提出一种结合多尺度特征与混淆学习的网络结构。为实现高效的特征提取、缩小模态内差异,将网络设计为多尺度特征互补的形式,分别学习行人的局部细化特征与全局粗糙特征,从细粒度和粗粒度两方面来增强网络的特征表达能力。利用混淆学习策略,模糊网络的模态识别反馈,挖掘稳定且有效的模态无关属性应对模态差异,来提高特征对模态变化的鲁棒性。在大规模数据集SYSU-MM01的全搜索模式下该算法首位击中率和平均精度(mean average precision,mAP)的结果分别为76.69%和72.45%,在RegDB数据集的可见光到红外模式下该算法首位击中率和mAP的结果分别为94.62%和94.60%,优于现有的主要方法,验证了所提方法的有效性。展开更多
基金supported in part by the National Natural Science Foundation of China under Grant 62177029,62307025in part by the Startup Foundation for Introducing Talent of Nanjing University of Posts and Communications under Grant NY221041in part by the General Project of The Natural Science Foundation of Jiangsu Higher Education Institution of China 22KJB520025,23KJD580.
文摘Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared images for video surveillance,which poses a challenge in exploring cross-modal shared information accurately and efficiently.Therefore,multi-granularity feature learning methods have been applied in VI-ReID to extract potential multi-granularity semantic information related to pedestrian body structure attributes.However,existing research mainly uses traditional dual-stream fusion networks and overlooks the core of cross-modal learning networks,the fusion module.This paper introduces a novel network called the Augmented Deep Multi-Granularity Pose-Aware Feature Fusion Network(ADMPFF-Net),incorporating the Multi-Granularity Pose-Aware Feature Fusion(MPFF)module to generate discriminative representations.MPFF efficiently explores and learns global and local features with multi-level semantic information by inserting disentangling and duplicating blocks into the fusion module of the backbone network.ADMPFF-Net also provides a new perspective for designing multi-granularity learning networks.By incorporating the multi-granularity feature disentanglement(mGFD)and posture information segmentation(pIS)strategies,it extracts more representative features concerning body structure information.The Local Information Enhancement(LIE)module augments high-performance features in VI-ReID,and the multi-granularity joint loss supervises model training for objective feature learning.Experimental results on two public datasets show that ADMPFF-Net efficiently constructs pedestrian feature representations and enhances the accuracy of VI-ReID.
基金the National Key Research and Development Program of China under Grant No.2019YFF0301800the National Natural Science Foundation of China under Grant No.61379106the Shandong Provincial Natural Science Foundation under Grant Nos.ZR2013FM036 and ZR2015FM011.
文摘Visible-infrared person re-identification has attracted extensive attention from the community due to its potential great application prospects in video surveillance.There are huge modality discrepancies between visible and infrared images caused by different imaging mechanisms.Existing studies alleviate modality discrepancies by aligning modality distribution or extracting modality-shared features on the original image.However,they ignore a key solution,i.e.,converting visible images to gray images directly,which is efficient and effective to reduce modality discrepancies.In this paper,we transform the cross-modality person re-identification task from visible-infrared images to gray-infrared images,which is named as the minimal modality discrepancy.In addition,we propose a pyramid feature integration network(PFINet)which mines the discriminative refined features of pedestrian images and fuses high-level and semantically strong features to build a robust pedestrian representation.Specifically,PFINet first performs the feature extraction from concrete to abstract and the top-down semantic transfer to obtain multi-scale feature maps.Second,the multi-scale feature maps are inputted to the discriminative-region response module to emphasize the identity-discriminative regions by the spatial attention mechanism.Finally,the pedestrian representation is obtained by the feature integration.Extensive experiments demonstrate the effectiveness of PFINet which achieves the rank-1 accuracy of 81.95%and mAP of 74.49%on the multi-all evaluation mode of the SYSU-MM01 dataset.
文摘跨模态行人重识别研究的重难点主要来自于行人图像之间巨大的模态差异和模态内差异。针对这些问题,提出一种结合多尺度特征与混淆学习的网络结构。为实现高效的特征提取、缩小模态内差异,将网络设计为多尺度特征互补的形式,分别学习行人的局部细化特征与全局粗糙特征,从细粒度和粗粒度两方面来增强网络的特征表达能力。利用混淆学习策略,模糊网络的模态识别反馈,挖掘稳定且有效的模态无关属性应对模态差异,来提高特征对模态变化的鲁棒性。在大规模数据集SYSU-MM01的全搜索模式下该算法首位击中率和平均精度(mean average precision,mAP)的结果分别为76.69%和72.45%,在RegDB数据集的可见光到红外模式下该算法首位击中率和mAP的结果分别为94.62%和94.60%,优于现有的主要方法,验证了所提方法的有效性。