The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have ...The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%.展开更多
In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or expl...In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or explicitly constructing an appropriate intermediate domain to enhance recognition capability on the target domain.Implicit construction is difficult due to the absence of intermediate state supervision,making smooth knowledge transfer from the source to the target domain a challenge.To explicitly construct the most suitable intermediate domain for the model to gradually adapt to the feature distribution changes from the source to the target domain,we propose the Minimal Transfer Cost Framework(MTCF).MTCF considers all scenarios of the intermediate domain during the transfer process,ensuring smoother and more efficient domain alignment.Our framework mainly includes threemodules:Intermediate Domain Generator(IDG),Cross-domain Feature Constraint Module(CFCM),and Residual Channel Space Module(RCSM).First,the IDG Module is introduced to generate all possible intermediate domains,ensuring a smooth transition of knowledge fromthe source to the target domain.To reduce the cross-domain feature distribution discrepancy,we propose the CFCM Module,which quantifies the difficulty of knowledge transfer and ensures the diversity of intermediate domain features and their semantic relevance,achieving alignment between the source and target domains by incorporating mutual information and maximum mean discrepancy.We also design the RCSM,which utilizes attention mechanism to enhance the model’s focus on personnel features in low-resolution images,improving the accuracy and efficiency of person re-ID.Our proposed method outperforms existing technologies in all common UDA re-ID tasks and improves the Mean Average Precision(mAP)by 2.3%in the Market to Duke task compared to the state-of-the-art(SOTA)methods.展开更多
Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared ima...Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared images for video surveillance,which poses a challenge in exploring cross-modal shared information accurately and efficiently.Therefore,multi-granularity feature learning methods have been applied in VI-ReID to extract potential multi-granularity semantic information related to pedestrian body structure attributes.However,existing research mainly uses traditional dual-stream fusion networks and overlooks the core of cross-modal learning networks,the fusion module.This paper introduces a novel network called the Augmented Deep Multi-Granularity Pose-Aware Feature Fusion Network(ADMPFF-Net),incorporating the Multi-Granularity Pose-Aware Feature Fusion(MPFF)module to generate discriminative representations.MPFF efficiently explores and learns global and local features with multi-level semantic information by inserting disentangling and duplicating blocks into the fusion module of the backbone network.ADMPFF-Net also provides a new perspective for designing multi-granularity learning networks.By incorporating the multi-granularity feature disentanglement(mGFD)and posture information segmentation(pIS)strategies,it extracts more representative features concerning body structure information.The Local Information Enhancement(LIE)module augments high-performance features in VI-ReID,and the multi-granularity joint loss supervises model training for objective feature learning.Experimental results on two public datasets show that ADMPFF-Net efficiently constructs pedestrian feature representations and enhances the accuracy of VI-ReID.展开更多
Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a...Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a sufficiently high resolution,yet such models are not applicable to the open world.In real world,the changing distance between pedestrians and the camera renders the resolution of pedestrians captured by the camera inconsistent.When low-resolution(LR)images in the query set are matched with high-resolution(HR)images in the gallery set,it degrades the performance of the pedestrian matching task due to the absent pedestrian critical information in LR images.To address the above issues,we present a dualstream coupling network with wavelet transform(DSCWT)for the cross-resolution person re-identification task.Firstly,we use the multi-resolution analysis principle of wavelet transform to separately process the low-frequency and high-frequency regions of LR images,which is applied to restore the lost detail information of LR images.Then,we devise a residual knowledge constrained loss function that transfers knowledge between the two streams of LR images and HR images for accessing pedestrian invariant features at various resolutions.Extensive qualitative and quantitative experiments across four benchmark datasets verify the superiority of the proposed approach.展开更多
Visible-infrared person re-identification(VIPR), is a cross-modal retrieval task that searches a target from a gallery captured by cameras of different spectrums.The severe challenge for VIPR is the large intra-class ...Visible-infrared person re-identification(VIPR), is a cross-modal retrieval task that searches a target from a gallery captured by cameras of different spectrums.The severe challenge for VIPR is the large intra-class variation caused by the modal discrepancy between visible and infrared images.For that, this paper proposes a query related cluster(QRC) method for VIPR.Firstly, this paper uses an attention mechanism to calculate the similarity relation between a visible query and infrared images with the same identity in the gallery.Secondly, those infrared images with the same query images are aggregated by using the similarity relation to form a dynamic clustering center corresponding to the query image.Thirdly, QRC loss function is designed to enlarge the similarity between the query image and its dynamic cluster center to achieve query related clustering, so as to compact the intra-class variations.Consequently, in the proposed QRC method, each query has its own dynamic clustering center, which can well characterize intra-class variations in VIPR.Experimental results demonstrate that the proposed QRC method is superior to many state-of-the-art approaches, acquiring a 90.77% rank-1 identification rate on the RegDB dataset.展开更多
Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropria...Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropriate model training solution.However,the required massive personal data for training contain private information with a significant risk of data leakage in cloud environments,leading to significant communication overheads.This paper proposes a federated person ReID method with model-contrastive learning(MOON)in an edge-cloud environment,named FRM.Specifically,based on federated partial averaging,MOON warmup is added to correct the local training of individual edge servers and improve the model’s effectiveness by calculating and back-propagating a model-contrastive loss,which represents the similarity between local and global models.In addition,we propose a lightweight person ReID network,named multi-branch combined depth space network(MB-CDNet),to reduce the computing resource usage of the edge device when training and testing the person ReID model.MB-CDNet is a multi-branch version of combined depth space network(CDNet).We add a part branch and a global branch on the basis of CDNet and introduce an attention pyramid to improve the performance of the model.The experimental results on open-access person ReID datasets demonstrate that FRM achieves better performance than existing baseline.展开更多
Person re-identification(ReID)is a sub-problem under image retrieval.It is a technology that uses computer vision to identify a specific pedestrian in a collection of pictures or videos.The pedestrian image under cros...Person re-identification(ReID)is a sub-problem under image retrieval.It is a technology that uses computer vision to identify a specific pedestrian in a collection of pictures or videos.The pedestrian image under cross-device is taken from a monitored pedestrian image.At present,most ReID methods deal with the matching between visible and visible images,but with the continuous improvement of security monitoring system,more and more infrared cameras are used to monitor at night or in dim light.Due to the image differences between infrared camera and RGB camera,there is a huge visual difference between cross-modality images,so the traditional ReID method is difficult to apply in this scene.In view of this situation,studying the pedestrian matching between visible and infrared modalities is particularly crucial.Visible-infrared person re-identification(VI-ReID)was first proposed in 2017,and then attracted more and more attention,and many advanced methods emerged.展开更多
Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-i...Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-identification(Re-ID)in video surveillance system can track and identify suspicious people,track and statistically analyze persons.The purpose of person re-identification is to recognize the same person in different cameras.Deep learning-based person re-identification research has produced numerous remarkable outcomes as a result of deep learning's growing popularity.The purpose of this paperis to help researchers better understand where person re-identification research is at the moment and where it is headed.Firstly,this paper arranges the widely used datasets and assessment criteria in person re-identification and reviews the pertinent research on deep learning-based person re-identification techniques conducted in the last several years.Then,the commonly used method techniques are also discussed from four aspects:appearance features,metric learning,local features,and adversarial learning.Finally,future research directions in the field of person re-identification are outlooked.展开更多
Cross-modality pedestrian re-identification has important appli-cations in the field of surveillance.Due to variations in posture,camera per-spective,and camera modality,some salient pedestrian features are difficult ...Cross-modality pedestrian re-identification has important appli-cations in the field of surveillance.Due to variations in posture,camera per-spective,and camera modality,some salient pedestrian features are difficult to provide effective retrieval cues.Therefore,it becomes a challenge to design an effective strategy to extract more discriminative pedestrian detail.Although many effective methods for detailed feature extraction are proposed,there are still some shortcomings in filtering background and modality noise.To further purify the features,a pure detail feature extraction network(PDFENet)is proposed for VI-ReID.PDFENet includes three modules,adaptive detail mask generation module(ADMG),inter-detail interaction module(IDI)and cross-modality cross-entropy(CMCE).ADMG and IDI use human joints and their semantic associations to suppress background noise in features.CMCE guides the model to ignore modality noise by generating modality-shared feature labels.Specifically,ADMG generates masks for pedestrian details based on pose estimation.Masks are used to suppress background information and enhance pedestrian detail information.Besides,IDI mines the semantic relations among details to further refine the features.Finally,CMCE cross-combines classifiers and features to generate modality-shared feature labels to guide model training.Extensive ablation experiments as well as visualization results have demonstrated the effectiveness of PDFENet in eliminating background and modality noise.In addition,comparison experi-ments in two publicly available datasets also show the competitiveness of our approach.展开更多
Person re-identification(re-id)involves matching a person across nonoverlapping views,with different poses,illuminations and conditions.Visual attributes are understandable semantic information to help improve the iss...Person re-identification(re-id)involves matching a person across nonoverlapping views,with different poses,illuminations and conditions.Visual attributes are understandable semantic information to help improve the issues including illumination changes,viewpoint variations and occlusions.This paper proposes an end-to-end framework of deep learning for attribute-based person re-id.In the feature representation stage of framework,the improved convolutional neural network(CNN)model is designed to leverage the information contained in automatically detected attributes and learned low-dimensional CNN features.Moreover,an attribute classifier is trained on separate data and includes its responses into the training process of our person re-id model.The coupled clusters loss function is used in the training stage of the framework,which enhances the discriminability of both types of features.The combined features are mapped into the Euclidean space.The L2 distance can be used to calculate the distance between any two pedestrians to determine whether they are the same.Extensive experiments validate the superiority and advantages of our proposed framework over state-of-the-art competitors on contemporary challenging person re-id datasets.展开更多
Person re-identification(re-ID)aims to match images of the same pedestrian across different cameras.It plays an important role in the field of security and surveillance.Although it has been studied for many years,it i...Person re-identification(re-ID)aims to match images of the same pedestrian across different cameras.It plays an important role in the field of security and surveillance.Although it has been studied for many years,it is still considered as an unsolved problem.Since the rise of deep learning,the accuracy of supervised person re-ID on public datasets has reached the highest level.However,these methods are difficult to apply to real-life scenarios because a large number of labeled training data is required in this situation.Pedestrian identity labeling,especially cross-camera pedestrian identity labeling,is heavy and expensive.Why we cannot apply the pre-trained model directly to the unseen camera network?Due to the existence of domain bias between source and target environment,the accuracy on target dataset is always low.For example,the model trained on the mall needs to adapt to the new environment of airport obviously.Recently,some researches have been proposed to solve this problem,including clustering-based methods,GAN-based methods,co-training methods and unsupervised domain adaptation methods.展开更多
Person re-identification (re-id) on robot platform is an important application for human-robot- interaction (HRI), which aims at making the robot recognize the around persons in varying scenes. Although many effec...Person re-identification (re-id) on robot platform is an important application for human-robot- interaction (HRI), which aims at making the robot recognize the around persons in varying scenes. Although many effective methods have been proposed for surveillance re-id in recent years, re-id on robot platform is still a novel unsolved problem. Most existing methods adapt the supervised metric learning offline to improve the accuracy. However, these methods can not adapt to unknown scenes. To solve this problem, an online re-id framework is proposed. Considering that robotics can afford to use high-resolution RGB-D sensors and clear human face may be captured, face information is used to update the metric model. Firstly, the metric model is pre-trained offline using labeled data. Then during the online stage, we use face information to mine incorrect body matching pairs which are collected to update the metric model online. In addition, to make full use of both appearance and skeleton information provided by RGB-D sensors, a novel feature funnel model (FFM) is proposed. Comparison studies show our approach is more effective and adaptable to varying environments.展开更多
Person re-ID is becoming increasingly popular in the field of modern surveillance.The purpose of person re-ID is to retrieve person of interests in non-overlapping multi-camera surveillance system.Due to the complexit...Person re-ID is becoming increasingly popular in the field of modern surveillance.The purpose of person re-ID is to retrieve person of interests in non-overlapping multi-camera surveillance system.Due to the complexity of the surveillance scene,the person images captured by cameras often have problems such as size variation,rotation,occlusion,illumination difference,etc.,which brings great challenges to the study of person re-ID.In recent years,studies based on deep learning have achieved great success in person re-ID.The improvement of basic networks and a large number of studies on the influencing factors have greatly improved the accuracy of person re-ID.Recently,some studies utilize GAN to tackle the domain adaptation task by transferring person images of source domain to the style of target domain and have achieved state of the art result in person re-ID.展开更多
Person re-identification(Re-ID) is integral to intelligent monitoring systems.However,due to the variability in viewing angles and illumination,it is easy to cause visual ambiguities,affecting the accuracy of person r...Person re-identification(Re-ID) is integral to intelligent monitoring systems.However,due to the variability in viewing angles and illumination,it is easy to cause visual ambiguities,affecting the accuracy of person re-identification.An approach for person re-identification based on feature mapping space and sample determination is proposed.At first,a weight fusion model,including mean and maximum value of the horizontal occurrence in local features,is introduced into the mapping space to optimize local features.Then,the Gaussian distribution model with hierarchical mean and covariance of pixel features is introduced to enhance feature expression.Finally,considering the influence of the size of samples on metric learning performance,the appropriate metric learning is selected by sample determination method to further improve the performance of person re-identification.Experimental results on the VIPeR,PRID450 S and CUHK01 datasets demonstrate that the proposed method is better than the traditional methods.展开更多
Person Re-identification(re-ID)is a hot research topic in the field of computer vision now,which can be regarded as a sub-problem of image retrieval.The goal of person re-ID is to give a monitoring pedestrian image an...Person Re-identification(re-ID)is a hot research topic in the field of computer vision now,which can be regarded as a sub-problem of image retrieval.The goal of person re-ID is to give a monitoring pedestrian image and retrieve other images of the pedestrian across the device.At present,person re-ID is mainly divided into two categories.One is the traditional methods,which relies heavily on manual features.The other is to use deep learning technology to solve.Because traditional methods mainly rely on manual feature,they cannot adapt well to a complex environment with a large amount of data.In recent years,with the development of deep learning technology,a large number of person re-ID methods based on deep learning have been proposed,which greatly improves the accuracy of person re-ID.展开更多
Person re-identification(Re-ID)is a fundamental subject in the field of the computer vision technologies.The traditional methods of person Re-ID have difficulty in solving the problems of person illumination,occlusion...Person re-identification(Re-ID)is a fundamental subject in the field of the computer vision technologies.The traditional methods of person Re-ID have difficulty in solving the problems of person illumination,occlusion and attitude change under complex background.Meanwhile,the introduction of deep learning opens a new way of person Re-ID research and becomes a hot spot in this field.This study reviews the traditional methods of person Re-ID,then the authors focus on the related papers about different person Re-ID frameworks on the basis of deep learning,and discusses their advantages and disadvantages.Finally,they propose the direction of further research,especially the prospect of person Re-ID methods based on deep learning.展开更多
Recently,person Re-identification(person Re-id)has attracted more and more attention,which has become a research focus of computer vision community.Person Re-id is used to ascertain whether the target pedestrians capt...Recently,person Re-identification(person Re-id)has attracted more and more attention,which has become a research focus of computer vision community.Person Re-id is used to ascertain whether the target pedestrians captured by cameras in different positions at different moments are the same person or not.However,due to the influence of various complex factors,person Re-id still has a lot of holes to be filled.In this paper,we first review the research process of person Re-id,and then,two kinds of mainstream methods for person Re-id are introduced respectively,according to the different types of training data they used.After that,we introduce some specific methods for different kinds of person Re-id,including handcrafted feature descriptor and metrics learning based methods as well as neural network based methods.Then,some commonly used datasets and their performance evaluation criteria are introduced.Finally,we compare these methods in order to display their advantages and disadvantages.Last but not list,depending on the current research status and development tendency,we make a prospect for person Re-id research.展开更多
Person re-identification has been a hot research issues in the field of computer vision.In recent years,with the maturity of the theory,a large number of excellent methods have been proposed.However,large-scale data s...Person re-identification has been a hot research issues in the field of computer vision.In recent years,with the maturity of the theory,a large number of excellent methods have been proposed.However,large-scale data sets and huge networks make training a time-consuming process.At the same time,the parameters and their values generated during the training process also take up a lot of computer resources.Therefore,we apply distributed cloud computing method to perform person re-identification task.Using distributed data storage method,pedestrian data sets and parameters are stored in cloud nodes.To speed up operational efficiency and increase fault tolerance,we add data redundancy mechanism to copy and store data blocks to different nodes,and we propose a hash loop optimization algorithm to optimize the data distribution process.Moreover,we assign different layers of the re-identification network to different nodes to complete the training in the way of model parallelism.By comparing and analyzing the accuracy and operation speed of the distributed model on the video-based dataset MARS,the results show that our distributed model has a faster training speed.展开更多
Existing unsupervised person re-identification approaches fail to fully capture thefine-grained features of local regions,which can result in people with similar appearances and different identities being assigned the...Existing unsupervised person re-identification approaches fail to fully capture thefine-grained features of local regions,which can result in people with similar appearances and different identities being assigned the same label after clustering.The identity-independent information contained in different local regions leads to different levels of local noise.To address these challenges,joint training with local soft attention and dual cross-neighbor label smoothing(DCLS)is proposed in this study.First,the joint training is divided into global and local parts,whereby a soft attention mechanism is proposed for the local branch to accurately capture the subtle differences in local regions,which improves the ability of the re-identification model in identifying a person’s local significant features.Second,DCLS is designed to progressively mitigate label noise in different local regions.The DCLS uses global and local similarity metrics to semantically align the global and local regions of the person and further determines the proximity association between local regions through the cross information of neighboring regions,thereby achieving label smoothing of the global and local regions throughout the training process.In extensive experiments,the proposed method outperformed existing methods under unsupervised settings on several standard person re-identification datasets.展开更多
Person Search is a task involving pedestrian detection and person re-identification,aiming to retrieve person images matching a given objective attribute from a large-scale image library.The Person Search models need ...Person Search is a task involving pedestrian detection and person re-identification,aiming to retrieve person images matching a given objective attribute from a large-scale image library.The Person Search models need to understand and capture the detailed features and context information of smaller objects in the image more accurately and comprehensively.The current popular Person Search models,whether end-to-end or two-step,are based on anchor boxes.However,due to the limitations of the anchor itself,the model inevitably has some disadvantages,such as unbalance of positive and negative samples and redundant calculation,which will affect the performance of models.To address the problem of fine-grained understanding of target pedestrians in complex scenes and small sizes,this paper proposes a Deformable-Attention-based Anchor-free Person Search model(DAAPS).Fully Convolutional One-Stage(FCOS),as a classic Anchor-free detector,is chosen as the model’s infrastructure.The DAAPS model is the first to combine the Anchor-free Person Search model with Deformable Attention Mechanism,applied to guide the model adaptively adjust the perceptual.The Deformable Attention Mechanism is used to help the model focus on the critical information and effectively improve the poor accuracy caused by the absence of anchor boxes.The experiment proves the adaptability of the Attention mechanism to the Anchor-free model.Besides,with an improved ResNeXt+network frame,the DAAPS model selects the Triplet-based Online Instance Matching(TOIM)Loss function to achieve a more precise end-to-end Person Search task.Simulation experiments demonstrate that the proposed model has higher accuracy and better robustness than most Person Search models,reaching 95.0%of mean Average Precision(mAP)and 95.6%of Top-1 on the CUHK-SYSU dataset,48.6%of mAP and 84.7%of Top-1 on the Person Re-identification in the Wild(PRW)dataset,respectively.展开更多
基金supported in part by the Natural Science Foundation of Xinjiang Uygur Autonomous Region(Grant No.2022D01B186 and No.2022D01B05)。
文摘The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%.
文摘In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or explicitly constructing an appropriate intermediate domain to enhance recognition capability on the target domain.Implicit construction is difficult due to the absence of intermediate state supervision,making smooth knowledge transfer from the source to the target domain a challenge.To explicitly construct the most suitable intermediate domain for the model to gradually adapt to the feature distribution changes from the source to the target domain,we propose the Minimal Transfer Cost Framework(MTCF).MTCF considers all scenarios of the intermediate domain during the transfer process,ensuring smoother and more efficient domain alignment.Our framework mainly includes threemodules:Intermediate Domain Generator(IDG),Cross-domain Feature Constraint Module(CFCM),and Residual Channel Space Module(RCSM).First,the IDG Module is introduced to generate all possible intermediate domains,ensuring a smooth transition of knowledge fromthe source to the target domain.To reduce the cross-domain feature distribution discrepancy,we propose the CFCM Module,which quantifies the difficulty of knowledge transfer and ensures the diversity of intermediate domain features and their semantic relevance,achieving alignment between the source and target domains by incorporating mutual information and maximum mean discrepancy.We also design the RCSM,which utilizes attention mechanism to enhance the model’s focus on personnel features in low-resolution images,improving the accuracy and efficiency of person re-ID.Our proposed method outperforms existing technologies in all common UDA re-ID tasks and improves the Mean Average Precision(mAP)by 2.3%in the Market to Duke task compared to the state-of-the-art(SOTA)methods.
基金supported in part by the National Natural Science Foundation of China under Grant 62177029,62307025in part by the Startup Foundation for Introducing Talent of Nanjing University of Posts and Communications under Grant NY221041in part by the General Project of The Natural Science Foundation of Jiangsu Higher Education Institution of China 22KJB520025,23KJD580.
文摘Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared images for video surveillance,which poses a challenge in exploring cross-modal shared information accurately and efficiently.Therefore,multi-granularity feature learning methods have been applied in VI-ReID to extract potential multi-granularity semantic information related to pedestrian body structure attributes.However,existing research mainly uses traditional dual-stream fusion networks and overlooks the core of cross-modal learning networks,the fusion module.This paper introduces a novel network called the Augmented Deep Multi-Granularity Pose-Aware Feature Fusion Network(ADMPFF-Net),incorporating the Multi-Granularity Pose-Aware Feature Fusion(MPFF)module to generate discriminative representations.MPFF efficiently explores and learns global and local features with multi-level semantic information by inserting disentangling and duplicating blocks into the fusion module of the backbone network.ADMPFF-Net also provides a new perspective for designing multi-granularity learning networks.By incorporating the multi-granularity feature disentanglement(mGFD)and posture information segmentation(pIS)strategies,it extracts more representative features concerning body structure information.The Local Information Enhancement(LIE)module augments high-performance features in VI-ReID,and the multi-granularity joint loss supervises model training for objective feature learning.Experimental results on two public datasets show that ADMPFF-Net efficiently constructs pedestrian feature representations and enhances the accuracy of VI-ReID.
基金supported by the National Natural Science Foundation of China(61471154,61876057)the Key Research and Development Program of Anhui Province-Special Project of Strengthening Science and Technology Police(202004D07020012).
文摘Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a sufficiently high resolution,yet such models are not applicable to the open world.In real world,the changing distance between pedestrians and the camera renders the resolution of pedestrians captured by the camera inconsistent.When low-resolution(LR)images in the query set are matched with high-resolution(HR)images in the gallery set,it degrades the performance of the pedestrian matching task due to the absent pedestrian critical information in LR images.To address the above issues,we present a dualstream coupling network with wavelet transform(DSCWT)for the cross-resolution person re-identification task.Firstly,we use the multi-resolution analysis principle of wavelet transform to separately process the low-frequency and high-frequency regions of LR images,which is applied to restore the lost detail information of LR images.Then,we devise a residual knowledge constrained loss function that transfers knowledge between the two streams of LR images and HR images for accessing pedestrian invariant features at various resolutions.Extensive qualitative and quantitative experiments across four benchmark datasets verify the superiority of the proposed approach.
基金Supported by the National Natural Science Foundation of China (No.61976098)the Natural Science Foundation for Outstanding Young Scholars of Fujian Province (No.2022J06023)。
文摘Visible-infrared person re-identification(VIPR), is a cross-modal retrieval task that searches a target from a gallery captured by cameras of different spectrums.The severe challenge for VIPR is the large intra-class variation caused by the modal discrepancy between visible and infrared images.For that, this paper proposes a query related cluster(QRC) method for VIPR.Firstly, this paper uses an attention mechanism to calculate the similarity relation between a visible query and infrared images with the same identity in the gallery.Secondly, those infrared images with the same query images are aggregated by using the similarity relation to form a dynamic clustering center corresponding to the query image.Thirdly, QRC loss function is designed to enlarge the similarity between the query image and its dynamic cluster center to achieve query related clustering, so as to compact the intra-class variations.Consequently, in the proposed QRC method, each query has its own dynamic clustering center, which can well characterize intra-class variations in VIPR.Experimental results demonstrate that the proposed QRC method is superior to many state-of-the-art approaches, acquiring a 90.77% rank-1 identification rate on the RegDB dataset.
基金supported by the the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20211284the Financial and Science Technology Plan Project of Xinjiang Production and Construction Corps under Grant No.2020DB005.
文摘Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropriate model training solution.However,the required massive personal data for training contain private information with a significant risk of data leakage in cloud environments,leading to significant communication overheads.This paper proposes a federated person ReID method with model-contrastive learning(MOON)in an edge-cloud environment,named FRM.Specifically,based on federated partial averaging,MOON warmup is added to correct the local training of individual edge servers and improve the model’s effectiveness by calculating and back-propagating a model-contrastive loss,which represents the similarity between local and global models.In addition,we propose a lightweight person ReID network,named multi-branch combined depth space network(MB-CDNet),to reduce the computing resource usage of the edge device when training and testing the person ReID model.MB-CDNet is a multi-branch version of combined depth space network(CDNet).We add a part branch and a global branch on the basis of CDNet and introduce an attention pyramid to improve the performance of the model.The experimental results on open-access person ReID datasets demonstrate that FRM achieves better performance than existing baseline.
文摘Person re-identification(ReID)is a sub-problem under image retrieval.It is a technology that uses computer vision to identify a specific pedestrian in a collection of pictures or videos.The pedestrian image under cross-device is taken from a monitored pedestrian image.At present,most ReID methods deal with the matching between visible and visible images,but with the continuous improvement of security monitoring system,more and more infrared cameras are used to monitor at night or in dim light.Due to the image differences between infrared camera and RGB camera,there is a huge visual difference between cross-modality images,so the traditional ReID method is difficult to apply in this scene.In view of this situation,studying the pedestrian matching between visible and infrared modalities is particularly crucial.Visible-infrared person re-identification(VI-ReID)was first proposed in 2017,and then attracted more and more attention,and many advanced methods emerged.
文摘Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-identification(Re-ID)in video surveillance system can track and identify suspicious people,track and statistically analyze persons.The purpose of person re-identification is to recognize the same person in different cameras.Deep learning-based person re-identification research has produced numerous remarkable outcomes as a result of deep learning's growing popularity.The purpose of this paperis to help researchers better understand where person re-identification research is at the moment and where it is headed.Firstly,this paper arranges the widely used datasets and assessment criteria in person re-identification and reviews the pertinent research on deep learning-based person re-identification techniques conducted in the last several years.Then,the commonly used method techniques are also discussed from four aspects:appearance features,metric learning,local features,and adversarial learning.Finally,future research directions in the field of person re-identification are outlooked.
基金supported by the National Natural Science Foundation of China (Grant No.61906168,62202429)Zhejiang Provincial Natural Science Foundation of China (Grant No.LY23F020023)Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects (2022SDSJ01).
文摘Cross-modality pedestrian re-identification has important appli-cations in the field of surveillance.Due to variations in posture,camera per-spective,and camera modality,some salient pedestrian features are difficult to provide effective retrieval cues.Therefore,it becomes a challenge to design an effective strategy to extract more discriminative pedestrian detail.Although many effective methods for detailed feature extraction are proposed,there are still some shortcomings in filtering background and modality noise.To further purify the features,a pure detail feature extraction network(PDFENet)is proposed for VI-ReID.PDFENet includes three modules,adaptive detail mask generation module(ADMG),inter-detail interaction module(IDI)and cross-modality cross-entropy(CMCE).ADMG and IDI use human joints and their semantic associations to suppress background noise in features.CMCE guides the model to ignore modality noise by generating modality-shared feature labels.Specifically,ADMG generates masks for pedestrian details based on pose estimation.Masks are used to suppress background information and enhance pedestrian detail information.Besides,IDI mines the semantic relations among details to further refine the features.Finally,CMCE cross-combines classifiers and features to generate modality-shared feature labels to guide model training.Extensive ablation experiments as well as visualization results have demonstrated the effectiveness of PDFENet in eliminating background and modality noise.In addition,comparison experi-ments in two publicly available datasets also show the competitiveness of our approach.
基金supported by the National Natural Science Foundation of China(6147115461876057)the Fundamental Research Funds for Central Universities(JZ2018YYPY0287)
文摘Person re-identification(re-id)involves matching a person across nonoverlapping views,with different poses,illuminations and conditions.Visual attributes are understandable semantic information to help improve the issues including illumination changes,viewpoint variations and occlusions.This paper proposes an end-to-end framework of deep learning for attribute-based person re-id.In the feature representation stage of framework,the improved convolutional neural network(CNN)model is designed to leverage the information contained in automatically detected attributes and learned low-dimensional CNN features.Moreover,an attribute classifier is trained on separate data and includes its responses into the training process of our person re-id model.The coupled clusters loss function is used in the training stage of the framework,which enhances the discriminability of both types of features.The combined features are mapped into the Euclidean space.The L2 distance can be used to calculate the distance between any two pedestrians to determine whether they are the same.Extensive experiments validate the superiority and advantages of our proposed framework over state-of-the-art competitors on contemporary challenging person re-id datasets.
文摘Person re-identification(re-ID)aims to match images of the same pedestrian across different cameras.It plays an important role in the field of security and surveillance.Although it has been studied for many years,it is still considered as an unsolved problem.Since the rise of deep learning,the accuracy of supervised person re-ID on public datasets has reached the highest level.However,these methods are difficult to apply to real-life scenarios because a large number of labeled training data is required in this situation.Pedestrian identity labeling,especially cross-camera pedestrian identity labeling,is heavy and expensive.Why we cannot apply the pre-trained model directly to the unseen camera network?Due to the existence of domain bias between source and target environment,the accuracy on target dataset is always low.For example,the model trained on the mall needs to adapt to the new environment of airport obviously.Recently,some researches have been proposed to solve this problem,including clustering-based methods,GAN-based methods,co-training methods and unsupervised domain adaptation methods.
基金This work is supported by the National Natural Science Foundation of China (NSFC, nos. 61340046), the National High Technology Research and Development Programme of China (863 Programme, no. 2006AA04Z247), the Scientific and Technical Innovation Commission of Shenzhen Municipality (nos. JCYJ20130331144631730), and the Specialized Research Fund for the Doctoral Programme of Higher Education (SRFDP, no. 20130001110011).
文摘Person re-identification (re-id) on robot platform is an important application for human-robot- interaction (HRI), which aims at making the robot recognize the around persons in varying scenes. Although many effective methods have been proposed for surveillance re-id in recent years, re-id on robot platform is still a novel unsolved problem. Most existing methods adapt the supervised metric learning offline to improve the accuracy. However, these methods can not adapt to unknown scenes. To solve this problem, an online re-id framework is proposed. Considering that robotics can afford to use high-resolution RGB-D sensors and clear human face may be captured, face information is used to update the metric model. Firstly, the metric model is pre-trained offline using labeled data. Then during the online stage, we use face information to mine incorrect body matching pairs which are collected to update the metric model online. In addition, to make full use of both appearance and skeleton information provided by RGB-D sensors, a novel feature funnel model (FFM) is proposed. Comparison studies show our approach is more effective and adaptable to varying environments.
文摘Person re-ID is becoming increasingly popular in the field of modern surveillance.The purpose of person re-ID is to retrieve person of interests in non-overlapping multi-camera surveillance system.Due to the complexity of the surveillance scene,the person images captured by cameras often have problems such as size variation,rotation,occlusion,illumination difference,etc.,which brings great challenges to the study of person re-ID.In recent years,studies based on deep learning have achieved great success in person re-ID.The improvement of basic networks and a large number of studies on the influencing factors have greatly improved the accuracy of person re-ID.Recently,some studies utilize GAN to tackle the domain adaptation task by transferring person images of source domain to the style of target domain and have achieved state of the art result in person re-ID.
基金Supported by the National Natural Science Foundation of China (No.61976080)the Science and Technology Key Project of Science and Technology Department of Henan Province (No.212102310298)+1 种基金the Innovation and Quality Improvement Project for Graduate Education of Henan University (No.SYL20010101)the Academic Degress&Graduate Education Reform Project of Henan Province (2021SJLX195Y)。
文摘Person re-identification(Re-ID) is integral to intelligent monitoring systems.However,due to the variability in viewing angles and illumination,it is easy to cause visual ambiguities,affecting the accuracy of person re-identification.An approach for person re-identification based on feature mapping space and sample determination is proposed.At first,a weight fusion model,including mean and maximum value of the horizontal occurrence in local features,is introduced into the mapping space to optimize local features.Then,the Gaussian distribution model with hierarchical mean and covariance of pixel features is introduced to enhance feature expression.Finally,considering the influence of the size of samples on metric learning performance,the appropriate metric learning is selected by sample determination method to further improve the performance of person re-identification.Experimental results on the VIPeR,PRID450 S and CUHK01 datasets demonstrate that the proposed method is better than the traditional methods.
文摘Person Re-identification(re-ID)is a hot research topic in the field of computer vision now,which can be regarded as a sub-problem of image retrieval.The goal of person re-ID is to give a monitoring pedestrian image and retrieve other images of the pedestrian across the device.At present,person re-ID is mainly divided into two categories.One is the traditional methods,which relies heavily on manual features.The other is to use deep learning technology to solve.Because traditional methods mainly rely on manual feature,they cannot adapt well to a complex environment with a large amount of data.In recent years,with the development of deep learning technology,a large number of person re-ID methods based on deep learning have been proposed,which greatly improves the accuracy of person re-ID.
基金supported by the Natural Science Foundation of China No.61703119,61573114Natural Science Fund of Heilongjiang Province of China No.QC2017070Fundamental Research Funds for the Central Universities of China No.HEUCFM180405.
文摘Person re-identification(Re-ID)is a fundamental subject in the field of the computer vision technologies.The traditional methods of person Re-ID have difficulty in solving the problems of person illumination,occlusion and attitude change under complex background.Meanwhile,the introduction of deep learning opens a new way of person Re-ID research and becomes a hot spot in this field.This study reviews the traditional methods of person Re-ID,then the authors focus on the related papers about different person Re-ID frameworks on the basis of deep learning,and discusses their advantages and disadvantages.Finally,they propose the direction of further research,especially the prospect of person Re-ID methods based on deep learning.
文摘Recently,person Re-identification(person Re-id)has attracted more and more attention,which has become a research focus of computer vision community.Person Re-id is used to ascertain whether the target pedestrians captured by cameras in different positions at different moments are the same person or not.However,due to the influence of various complex factors,person Re-id still has a lot of holes to be filled.In this paper,we first review the research process of person Re-id,and then,two kinds of mainstream methods for person Re-id are introduced respectively,according to the different types of training data they used.After that,we introduce some specific methods for different kinds of person Re-id,including handcrafted feature descriptor and metrics learning based methods as well as neural network based methods.Then,some commonly used datasets and their performance evaluation criteria are introduced.Finally,we compare these methods in order to display their advantages and disadvantages.Last but not list,depending on the current research status and development tendency,we make a prospect for person Re-id research.
基金the Common Key Technology Innovation Special of Key Industries of Chongqing Science and Technology Commission under Grant No.cstc2017zdcy-zdyfX0067.
文摘Person re-identification has been a hot research issues in the field of computer vision.In recent years,with the maturity of the theory,a large number of excellent methods have been proposed.However,large-scale data sets and huge networks make training a time-consuming process.At the same time,the parameters and their values generated during the training process also take up a lot of computer resources.Therefore,we apply distributed cloud computing method to perform person re-identification task.Using distributed data storage method,pedestrian data sets and parameters are stored in cloud nodes.To speed up operational efficiency and increase fault tolerance,we add data redundancy mechanism to copy and store data blocks to different nodes,and we propose a hash loop optimization algorithm to optimize the data distribution process.Moreover,we assign different layers of the re-identification network to different nodes to complete the training in the way of model parallelism.By comparing and analyzing the accuracy and operation speed of the distributed model on the video-based dataset MARS,the results show that our distributed model has a faster training speed.
基金supported by the National Natural Science Foundation of China under Grant Nos.62076117 and 62166026the Jiangxi Key Laboratory of Smart City under Grant No.20192BCD40002Jiangxi Provincial Natural Science Foundation under Grant No.20224BAB212011.
文摘Existing unsupervised person re-identification approaches fail to fully capture thefine-grained features of local regions,which can result in people with similar appearances and different identities being assigned the same label after clustering.The identity-independent information contained in different local regions leads to different levels of local noise.To address these challenges,joint training with local soft attention and dual cross-neighbor label smoothing(DCLS)is proposed in this study.First,the joint training is divided into global and local parts,whereby a soft attention mechanism is proposed for the local branch to accurately capture the subtle differences in local regions,which improves the ability of the re-identification model in identifying a person’s local significant features.Second,DCLS is designed to progressively mitigate label noise in different local regions.The DCLS uses global and local similarity metrics to semantically align the global and local regions of the person and further determines the proximity association between local regions through the cross information of neighboring regions,thereby achieving label smoothing of the global and local regions throughout the training process.In extensive experiments,the proposed method outperformed existing methods under unsupervised settings on several standard person re-identification datasets.
基金to the Natural Science Foundation of Shanghai under Grant 21ZR1426500,and the Top-Notch Innovative Talent Training Program for Graduate Students of Shanghai Maritime University under Grant 2021YBR008for their generous support and funding through the project funding program.This funding has played a pivotal role in the successful completion of our research.We are deeply appreciative of their invaluable contribution to our research efforts.
文摘Person Search is a task involving pedestrian detection and person re-identification,aiming to retrieve person images matching a given objective attribute from a large-scale image library.The Person Search models need to understand and capture the detailed features and context information of smaller objects in the image more accurately and comprehensively.The current popular Person Search models,whether end-to-end or two-step,are based on anchor boxes.However,due to the limitations of the anchor itself,the model inevitably has some disadvantages,such as unbalance of positive and negative samples and redundant calculation,which will affect the performance of models.To address the problem of fine-grained understanding of target pedestrians in complex scenes and small sizes,this paper proposes a Deformable-Attention-based Anchor-free Person Search model(DAAPS).Fully Convolutional One-Stage(FCOS),as a classic Anchor-free detector,is chosen as the model’s infrastructure.The DAAPS model is the first to combine the Anchor-free Person Search model with Deformable Attention Mechanism,applied to guide the model adaptively adjust the perceptual.The Deformable Attention Mechanism is used to help the model focus on the critical information and effectively improve the poor accuracy caused by the absence of anchor boxes.The experiment proves the adaptability of the Attention mechanism to the Anchor-free model.Besides,with an improved ResNeXt+network frame,the DAAPS model selects the Triplet-based Online Instance Matching(TOIM)Loss function to achieve a more precise end-to-end Person Search task.Simulation experiments demonstrate that the proposed model has higher accuracy and better robustness than most Person Search models,reaching 95.0%of mean Average Precision(mAP)and 95.6%of Top-1 on the CUHK-SYSU dataset,48.6%of mAP and 84.7%of Top-1 on the Person Re-identification in the Wild(PRW)dataset,respectively.