Person search mainly consists of two submissions,namely Person Detection and Person Re-identification(reID).Existing approaches are primarily based on Faster R-CNN and Convolutional Neural Network(CNN)(e.g.,ResNet).Wh...Person search mainly consists of two submissions,namely Person Detection and Person Re-identification(reID).Existing approaches are primarily based on Faster R-CNN and Convolutional Neural Network(CNN)(e.g.,ResNet).While these structures may detect high-quality bounding boxes,they seem to degrade the performance of re-ID.To address this issue,this paper proposes a Dual-Transformer Head Network(DTHN)for end-to-end person search,which contains two independent Transformer heads,a box head for detecting the bounding box and extracting efficient bounding box feature,and a re-ID head for capturing high-quality re-ID features for the re-ID task.Specifically,after the image goes through the ResNet backbone network to extract features,the Region Proposal Network(RPN)proposes possible bounding boxes.The box head then extracts more efficient features within these bounding boxes for detection.Following this,the re-ID head computes the occluded attention of the features in these bounding boxes and distinguishes them from other persons or backgrounds.Extensive experiments on two widely used benchmark datasets,CUHK-SYSU and PRW,achieve state-of-the-art performance levels,94.9 mAP and 95.3 top-1 scores on the CUHK-SYSU dataset,and 51.6 mAP and 87.6 top-1 scores on the PRW dataset,which demonstrates the advantages of this paper’s approach.The efficiency comparison also shows our method is highly efficient in both time and space.展开更多
基金supported by the Natural Science Foundation of Shanghai under Grant 21ZR1426500the National Natural Science Foundation of China under Grant 61873160.
文摘Person search mainly consists of two submissions,namely Person Detection and Person Re-identification(reID).Existing approaches are primarily based on Faster R-CNN and Convolutional Neural Network(CNN)(e.g.,ResNet).While these structures may detect high-quality bounding boxes,they seem to degrade the performance of re-ID.To address this issue,this paper proposes a Dual-Transformer Head Network(DTHN)for end-to-end person search,which contains two independent Transformer heads,a box head for detecting the bounding box and extracting efficient bounding box feature,and a re-ID head for capturing high-quality re-ID features for the re-ID task.Specifically,after the image goes through the ResNet backbone network to extract features,the Region Proposal Network(RPN)proposes possible bounding boxes.The box head then extracts more efficient features within these bounding boxes for detection.Following this,the re-ID head computes the occluded attention of the features in these bounding boxes and distinguishes them from other persons or backgrounds.Extensive experiments on two widely used benchmark datasets,CUHK-SYSU and PRW,achieve state-of-the-art performance levels,94.9 mAP and 95.3 top-1 scores on the CUHK-SYSU dataset,and 51.6 mAP and 87.6 top-1 scores on the PRW dataset,which demonstrates the advantages of this paper’s approach.The efficiency comparison also shows our method is highly efficient in both time and space.