Aiming at the problem that the existing pedestrian recognition technology re-identification effect is not good and the traditional method has low recognition effect. A feature fusion network is proposed in this paper,...Aiming at the problem that the existing pedestrian recognition technology re-identification effect is not good and the traditional method has low recognition effect. A feature fusion network is proposed in this paper, which combines the CNN features extracted by ResNet with the manual annotation attributes into a unified feature space. ResNet solved the problem of network degradation and multi-convergence in multi-layer CNN training, and extracted deeper features. The attribute combination method was adopted by the artificial annotation attributes. The CNN features were constrained by the hand-crafted features because of the back propagation. Then the loss measurement function was used to optimize network identification results. In the public datasets VIPeR, PRID, and CUHK for further testing, the experimental results show that the method achieves a high cumulative matching score.展开更多
The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have ...The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%.展开更多
In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or expl...In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or explicitly constructing an appropriate intermediate domain to enhance recognition capability on the target domain.Implicit construction is difficult due to the absence of intermediate state supervision,making smooth knowledge transfer from the source to the target domain a challenge.To explicitly construct the most suitable intermediate domain for the model to gradually adapt to the feature distribution changes from the source to the target domain,we propose the Minimal Transfer Cost Framework(MTCF).MTCF considers all scenarios of the intermediate domain during the transfer process,ensuring smoother and more efficient domain alignment.Our framework mainly includes threemodules:Intermediate Domain Generator(IDG),Cross-domain Feature Constraint Module(CFCM),and Residual Channel Space Module(RCSM).First,the IDG Module is introduced to generate all possible intermediate domains,ensuring a smooth transition of knowledge fromthe source to the target domain.To reduce the cross-domain feature distribution discrepancy,we propose the CFCM Module,which quantifies the difficulty of knowledge transfer and ensures the diversity of intermediate domain features and their semantic relevance,achieving alignment between the source and target domains by incorporating mutual information and maximum mean discrepancy.We also design the RCSM,which utilizes attention mechanism to enhance the model’s focus on personnel features in low-resolution images,improving the accuracy and efficiency of person re-ID.Our proposed method outperforms existing technologies in all common UDA re-ID tasks and improves the Mean Average Precision(mAP)by 2.3%in the Market to Duke task compared to the state-of-the-art(SOTA)methods.展开更多
Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared ima...Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared images for video surveillance,which poses a challenge in exploring cross-modal shared information accurately and efficiently.Therefore,multi-granularity feature learning methods have been applied in VI-ReID to extract potential multi-granularity semantic information related to pedestrian body structure attributes.However,existing research mainly uses traditional dual-stream fusion networks and overlooks the core of cross-modal learning networks,the fusion module.This paper introduces a novel network called the Augmented Deep Multi-Granularity Pose-Aware Feature Fusion Network(ADMPFF-Net),incorporating the Multi-Granularity Pose-Aware Feature Fusion(MPFF)module to generate discriminative representations.MPFF efficiently explores and learns global and local features with multi-level semantic information by inserting disentangling and duplicating blocks into the fusion module of the backbone network.ADMPFF-Net also provides a new perspective for designing multi-granularity learning networks.By incorporating the multi-granularity feature disentanglement(mGFD)and posture information segmentation(pIS)strategies,it extracts more representative features concerning body structure information.The Local Information Enhancement(LIE)module augments high-performance features in VI-ReID,and the multi-granularity joint loss supervises model training for objective feature learning.Experimental results on two public datasets show that ADMPFF-Net efficiently constructs pedestrian feature representations and enhances the accuracy of VI-ReID.展开更多
Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropria...Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropriate model training solution.However,the required massive personal data for training contain private information with a significant risk of data leakage in cloud environments,leading to significant communication overheads.This paper proposes a federated person ReID method with model-contrastive learning(MOON)in an edge-cloud environment,named FRM.Specifically,based on federated partial averaging,MOON warmup is added to correct the local training of individual edge servers and improve the model’s effectiveness by calculating and back-propagating a model-contrastive loss,which represents the similarity between local and global models.In addition,we propose a lightweight person ReID network,named multi-branch combined depth space network(MB-CDNet),to reduce the computing resource usage of the edge device when training and testing the person ReID model.MB-CDNet is a multi-branch version of combined depth space network(CDNet).We add a part branch and a global branch on the basis of CDNet and introduce an attention pyramid to improve the performance of the model.The experimental results on open-access person ReID datasets demonstrate that FRM achieves better performance than existing baseline.展开更多
Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a...Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a sufficiently high resolution,yet such models are not applicable to the open world.In real world,the changing distance between pedestrians and the camera renders the resolution of pedestrians captured by the camera inconsistent.When low-resolution(LR)images in the query set are matched with high-resolution(HR)images in the gallery set,it degrades the performance of the pedestrian matching task due to the absent pedestrian critical information in LR images.To address the above issues,we present a dualstream coupling network with wavelet transform(DSCWT)for the cross-resolution person re-identification task.Firstly,we use the multi-resolution analysis principle of wavelet transform to separately process the low-frequency and high-frequency regions of LR images,which is applied to restore the lost detail information of LR images.Then,we devise a residual knowledge constrained loss function that transfers knowledge between the two streams of LR images and HR images for accessing pedestrian invariant features at various resolutions.Extensive qualitative and quantitative experiments across four benchmark datasets verify the superiority of the proposed approach.展开更多
Cross-modality pedestrian re-identification has important appli-cations in the field of surveillance.Due to variations in posture,camera per-spective,and camera modality,some salient pedestrian features are difficult ...Cross-modality pedestrian re-identification has important appli-cations in the field of surveillance.Due to variations in posture,camera per-spective,and camera modality,some salient pedestrian features are difficult to provide effective retrieval cues.Therefore,it becomes a challenge to design an effective strategy to extract more discriminative pedestrian detail.Although many effective methods for detailed feature extraction are proposed,there are still some shortcomings in filtering background and modality noise.To further purify the features,a pure detail feature extraction network(PDFENet)is proposed for VI-ReID.PDFENet includes three modules,adaptive detail mask generation module(ADMG),inter-detail interaction module(IDI)and cross-modality cross-entropy(CMCE).ADMG and IDI use human joints and their semantic associations to suppress background noise in features.CMCE guides the model to ignore modality noise by generating modality-shared feature labels.Specifically,ADMG generates masks for pedestrian details based on pose estimation.Masks are used to suppress background information and enhance pedestrian detail information.Besides,IDI mines the semantic relations among details to further refine the features.Finally,CMCE cross-combines classifiers and features to generate modality-shared feature labels to guide model training.Extensive ablation experiments as well as visualization results have demonstrated the effectiveness of PDFENet in eliminating background and modality noise.In addition,comparison experi-ments in two publicly available datasets also show the competitiveness of our approach.展开更多
Visible-infrared person re-identification(VIPR), is a cross-modal retrieval task that searches a target from a gallery captured by cameras of different spectrums.The severe challenge for VIPR is the large intra-class ...Visible-infrared person re-identification(VIPR), is a cross-modal retrieval task that searches a target from a gallery captured by cameras of different spectrums.The severe challenge for VIPR is the large intra-class variation caused by the modal discrepancy between visible and infrared images.For that, this paper proposes a query related cluster(QRC) method for VIPR.Firstly, this paper uses an attention mechanism to calculate the similarity relation between a visible query and infrared images with the same identity in the gallery.Secondly, those infrared images with the same query images are aggregated by using the similarity relation to form a dynamic clustering center corresponding to the query image.Thirdly, QRC loss function is designed to enlarge the similarity between the query image and its dynamic cluster center to achieve query related clustering, so as to compact the intra-class variations.Consequently, in the proposed QRC method, each query has its own dynamic clustering center, which can well characterize intra-class variations in VIPR.Experimental results demonstrate that the proposed QRC method is superior to many state-of-the-art approaches, acquiring a 90.77% rank-1 identification rate on the RegDB dataset.展开更多
Person re-identification(ReID)is a sub-problem under image retrieval.It is a technology that uses computer vision to identify a specific pedestrian in a collection of pictures or videos.The pedestrian image under cros...Person re-identification(ReID)is a sub-problem under image retrieval.It is a technology that uses computer vision to identify a specific pedestrian in a collection of pictures or videos.The pedestrian image under cross-device is taken from a monitored pedestrian image.At present,most ReID methods deal with the matching between visible and visible images,but with the continuous improvement of security monitoring system,more and more infrared cameras are used to monitor at night or in dim light.Due to the image differences between infrared camera and RGB camera,there is a huge visual difference between cross-modality images,so the traditional ReID method is difficult to apply in this scene.In view of this situation,studying the pedestrian matching between visible and infrared modalities is particularly crucial.Visible-infrared person re-identification(VI-ReID)was first proposed in 2017,and then attracted more and more attention,and many advanced methods emerged.展开更多
Accurate and continuous identification of individual cattle is crucial to precision farming in recent years.It is also the prerequisite to monitor the individual feed intake and feeding time of beef cattle at medium t...Accurate and continuous identification of individual cattle is crucial to precision farming in recent years.It is also the prerequisite to monitor the individual feed intake and feeding time of beef cattle at medium to long distances over different cameras.However,beef cattle can tend to frequently move and change their feeding position during feeding.Furthermore,the great variations in their head direction and complex environments(light,occlusion,and background)can also lead to some difficulties in the recognition,particularly for the bio-similarities among individual cattle.Among them,AlignedReID++model is characterized by both global and local information for image matching.In particular,the dynamically matching local information(DMLI)algorithm has been introduced into the local branch to automatically align the horizontal local information.In this research,the AlignedReID++model was utilized and improved to achieve the better performance in cattle re-identification(ReID).Initially,triplet attention(TA)modules were integrated into the BottleNecks of ResNet50 Backbone.The feature extraction was then enhanced through cross-dimensional interactions with the minimal computational overhead.Since the TA modules in AlignedReID++baseline model increased the model size and floating point operations(FLOPs)by 0.005 M and 0.05 G,the rank-1 accuracy and mean average precision(mAP)were improved by 1.0 percentage points and 2.94 percentage points,respectively.Specifically,the rank-1 accuracies were outperformed by 0.86 percentage points and 0.12 percentage points,respectively,compared with the convolution block attention module(CBAM)and efficient channel attention(ECA)modules,although 0.94 percentage points were lower than that of squeeze-and-excitation(SE)modules.The mAP metric values were exceeded by 0.22,0.86 and 0.12 percentage points,respectively,compared with the SE,CBAM,and ECA modules.Additionally,the Cross-Entropy Loss function was replaced with the CosFace Loss function in the global branch of baseline model.CosFace Loss and Hard Triplet Loss were jointly employed to train the baseline model for the better identification on the similar individuals.AlignedReID++with CosFace Loss was outperformed the baseline model by 0.24 and 0.92 percentage points in the rank-1 accuracy and mAP,respectively,whereas,AlignedReID++with ArcFace Loss was exceeded by 0.36 and 0.56 percentage points,respectively.The improved model with the TA modules and CosFace Loss was achieved in a rank-1 accuracy of 94.42%,rank-5 accuracy of 98.78%,rank-10 accuracy of 99.34%,mAP of 63.90%,FLOPs of 5.45 G,frames per second(FPS)of 5.64,and model size of 23.78 M.The rank-1 accuracies were exceeded by 1.84,4.72,0.76 and 5.36 percentage points,respectively,compared with the baseline model,part-based convolutional baseline(PCB),multiple granularity network(MGN),and relation-aware global attention(RGA),while the mAP metrics were surpassed 6.42,5.86,4.30 and 7.38 percentage points,respectively.Meanwhile,the rank-1 accuracy was 0.98 percentage points lower than TransReID,but the mAP metric was exceeded by 3.90 percentage points.Moreover,the FLOPs of improved model were only 0.05 G larger than that of baseline model,while smaller than those of PCB,MGN,RGA,and TransReID by 0.68,6.51,25.4,and 16.55 G,respectively.The model size of improved model was 23.78 M,which was smaller than those of the baseline model,PCB,MGN,RGA,and TransReID by 0.03,2.33,45.06,14.53 and 62.85 M,respectively.The inference speed of improved model on a CPU was lower than those of PCB,MGN,and baseline model,but higher than TransReID and RGA.The t-SNE feature embedding visualization demonstrated that the global and local features were achieve in the better intra-class compactness and inter-class variability.Therefore,the improved model can be expected to effectively re-identify the beef cattle in natural environments of breeding farm,in order to monitor the individual feed intake and feeding time.展开更多
Pedestrian self-organizing movement plays a significant role in evacuation studies and architectural design.Lane formation,a typical self-organizing phenomenon,helps pedestrian system to become more orderly,the majori...Pedestrian self-organizing movement plays a significant role in evacuation studies and architectural design.Lane formation,a typical self-organizing phenomenon,helps pedestrian system to become more orderly,the majority of following behavior model and overtaking behavior model are imprecise and unrealistic compared with pedestrian movement in the real world.In this study,a pedestrian dynamic model considering detailed modelling of the following behavior and overtaking behavior is constructed,and a method of measuring the lane formation and pedestrian system order based on information entropy is proposed.Simulation and analysis demonstrate that the following and avoidance behaviors are important factors of lane formation.A high tendency of following results in good lane formation.Both non-selective following behavior and aggressive overtaking behavior cause the system order to decrease.The most orderly following strategy for a pedestrian is to overtake the former pedestrian whose speed is lower than approximately 70%of his own.The influence of the obstacle layout on pedestrian lane and egress efficiency is also studied with this model.The presence of a small obstacle does not obstruct the walking of pedestrians;in contrast,it may help to improve the egress efficiency by guiding the pedestrian flow and mitigating the reduction of pedestrian system orderliness.展开更多
Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-i...Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-identification(Re-ID)in video surveillance system can track and identify suspicious people,track and statistically analyze persons.The purpose of person re-identification is to recognize the same person in different cameras.Deep learning-based person re-identification research has produced numerous remarkable outcomes as a result of deep learning's growing popularity.The purpose of this paperis to help researchers better understand where person re-identification research is at the moment and where it is headed.Firstly,this paper arranges the widely used datasets and assessment criteria in person re-identification and reviews the pertinent research on deep learning-based person re-identification techniques conducted in the last several years.Then,the commonly used method techniques are also discussed from four aspects:appearance features,metric learning,local features,and adversarial learning.Finally,future research directions in the field of person re-identification are outlooked.展开更多
Walkability is an essential aspect of urban transportation systems. Properly designed walking paths can enhance transportation safety, encourage pedestrian activity, and improve community quality of life. This, in tur...Walkability is an essential aspect of urban transportation systems. Properly designed walking paths can enhance transportation safety, encourage pedestrian activity, and improve community quality of life. This, in turn, can help achieve sustainable development goals in urban areas. This pilot study uses wearable technology data to present a new method for measuring pedestrian stress in urban environments and the results were presented as an interactive geographic information system map to support risk-informed decision-making. The approach involves analyzing data from wearable devices using heart rate variability (RMSSD and slope analysis) to identify high-stress locations. This data-driven approach can help urban planners and safety experts identify and address pedestrian stressors, ultimately creating safer, more walkable cities. The study addresses a significant challenge in pedestrian safety by providing insights into factors and locations that trigger stress in pedestrians. During the pilot study, high-stress pedestrian experiences were identified due to issues like pedestrian-scooter interaction on pedestrian paths, pedestrian behavior around high foot traffic areas, and poor visibility at pedestrian crossings due to inadequate lighting.展开更多
Pedestrian positioning system(PPS)using wearable inertial sensors has wide applications towards various emerging fields such as smart healthcare,emergency rescue,soldier positioning,etc.The performance of traditional ...Pedestrian positioning system(PPS)using wearable inertial sensors has wide applications towards various emerging fields such as smart healthcare,emergency rescue,soldier positioning,etc.The performance of traditional PPS is limited by the cumulative error of inertial sensors,complex motion modes of pedestrians,and the low robustness of the multi-sensor collaboration structure.This paper presents a hybrid pedestrian positioning system using the combination of wearable inertial sensors and ultrasonic ranging(H-PPS).A robust two nodes integration structure is developed to adaptively combine the motion data acquired from the single waist-mounted and foot-mounted node,and enhanced by a novel ellipsoid constraint model.In addition,a deep-learning-based walking speed estimator is proposed by considering all the motion features provided by different nodes,which effectively reduces the cumulative error originating from inertial sensors.Finally,a comprehensive data and model dual-driven model is presented to effectively combine the motion data provided by different sensor nodes and walking speed estimator,and multi-level constraints are extracted to further improve the performance of the overall system.Experimental results indicate that the proposed H-PPS significantly improves the performance of the single PPS and outperforms existing algorithms in accuracy index under complex indoor scenarios.展开更多
Road traffic safety can decrease when drivers drive in a low-visibility environment.The application of visual perception technology to detect vehicles and pedestrians in infrared images proves to be an effective means...Road traffic safety can decrease when drivers drive in a low-visibility environment.The application of visual perception technology to detect vehicles and pedestrians in infrared images proves to be an effective means of reducing the risk of accidents.To tackle the challenges posed by the low recognition accuracy and the substan-tial computational burden associated with current infrared pedestrian-vehicle detection methods,an infrared pedestrian-vehicle detection method A proposal is presented,based on an enhanced version of You Only Look Once version 5(YOLOv5).First,A head specifically designed for detecting small targets has been integrated into the model to make full use of shallow feature information to enhance the accuracy in detecting small targets.Second,the Focal Generalized Intersection over Union(GIoU)is employed as an alternative to the original loss function to address issues related to target overlap and category imbalance.Third,the distribution shift convolution optimization feature extraction operator is used to alleviate the computational burden of the model without significantly compromising detection accuracy.The test results of the improved algorithm show that its average accuracy(mAP)reaches 90.1%.Specifically,the Giga Floating Point Operations Per second(GFLOPs)of the improved algorithm is only 9.1.In contrast,the improved algorithms outperformed the other algorithms on similar GFLOPs,such as YOLOv6n(11.9),YOLOv8n(8.7),YOLOv7t(13.2)and YOLOv5s(16.0).The mAPs that are 4.4%,3%,3.5%,and 1.7%greater than those of these algorithms show that the improved algorithm achieves higher accuracy in target detection tasks under similar computational resource overhead.On the other hand,compared with other algorithms such as YOLOv8l(91.1%),YOLOv6l(89.5%),YOLOv7(90.8%),and YOLOv3(90.1%),the improved algorithm needs only 5.5%,2.3%,8.6%,and 2.3%,respectively,of the GFLOPs.The improved algorithm has shown significant advancements in balancing accuracy and computational efficiency,making it promising for practical use in resource-limited scenarios.展开更多
With the development of positioning technology,loca-tion services are constantly in demand by people.As a primary location service pedestrian navigation has two main approaches based on radio and inertial navigation.T...With the development of positioning technology,loca-tion services are constantly in demand by people.As a primary location service pedestrian navigation has two main approaches based on radio and inertial navigation.The pedestrian naviga-tion based on radio is subject to environmental occlusion lead-ing to the degradation of positioning accuracy.The pedestrian navigation based on micro-electro-mechanical system inertial measurement unit(MIMU)is less susceptible to environmental interference,but its errors dissipate over time.In this paper,a chest card pedestrian navigation improvement method based on complementary correction is proposed in order to suppress the error divergence of inertial navigation methods.To suppress atti-tude errors,optimal feedback coefficients are established by pedestrian motion characteristics.To extend navigation time and improve positioning accuracy,the step length in subsequent movements is compensated by the first step length.The experi-mental results show that the positioning accuracy of the pro-posed method is improved by more than 47%and 44%com-pared with the pure inertia-based method combined with step compensation and the traditional complementary filtering com-bined method with step compensation.The proposed method can effectively suppress the error dispersion and improve the positioning accuracy.展开更多
Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion s...Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.展开更多
This study explores the challenges posed by pedestrian detection and occlusion in AR applications, employing a novel approach that utilizes RGB-D-based skeleton reconstruction to reduce the overhead of classical pedes...This study explores the challenges posed by pedestrian detection and occlusion in AR applications, employing a novel approach that utilizes RGB-D-based skeleton reconstruction to reduce the overhead of classical pedestrian detection algorithms during training. Furthermore, it is dedicated to addressing occlusion issues in pedestrian detection by using Azure Kinect for body tracking and integrating a robust occlusion management algorithm, significantly enhancing detection efficiency. In experiments, an average latency of 204 milliseconds was measured, and the detection accuracy reached an outstanding level of 97%. Additionally, this approach has been successfully applied in creating a simple yet captivating augmented reality game, demonstrating the practical application of the algorithm.展开更多
Traffic intersections are incredibly dangerous for drivers and pedestrians. Statistics from both Canada and the U.S. show a high number of fatalities and serious injuries related to crashes at intersections. In Canada...Traffic intersections are incredibly dangerous for drivers and pedestrians. Statistics from both Canada and the U.S. show a high number of fatalities and serious injuries related to crashes at intersections. In Canada, during 2019, the National Collision Database shows that 28% of traffic fatalities and 42% of serious injuries occurred at intersections. Likewise, the U.S. National Highway Traffic Administration (NHTSA) found that about 40% of the estimated 5,811,000 accidents in the U.S. during the year studied were intersection-related crashes. In fact, a major survey by the car insurance industry found that nearly 85% of drivers could not identify the correct action to take when approaching a yellow traffic light at an intersection. One major reason for these accidents is the “yellow light dilemma,” the ambiguous situation where a driver should stop or proceed forward when unexpectedly faced with a yellow light. This situation is even further exacerbated by the tendency of aggressive drivers to inappropriately speed up on the yellow just to get through the traffic light. A survey of Canadian drivers conducted by the Traffic Injury Research Foundation found that 9% of drivers admitted to speeding up to get through a traffic light. Another reason for these accidents is the increased danger of making a left-hand turn on yellow. According to the National Highway Traffic Safety Association (NHTSA), left turns occur in approximately 22.2% of collisions—as opposed to just 1.2% for right turns. Moreover, a study by CNN found left turns are three times as likely to kill pedestrians than right turns. The reason left turns are so much more likely to cause an accident is because they take a driver against traffic and in the path of oncoming cars. Additionally, most of these left turns occur at the driver’s discretion—as opposed to the distressingly brief left-hand arrow at busy intersections. Drive Safe Now proposes a workable solution for reducing the number of accidents occurring during a yellow light at intersections. We believe this fairly simple solution will save lives, prevent injuries, reduce damage to public and private property, and decrease insurance costs.展开更多
Vehicle re-identification(ReID)aims to retrieve the target vehicle in an extensive image gallery through its appearances from various views in the cross-camera scenario.It has gradually become a core technology of int...Vehicle re-identification(ReID)aims to retrieve the target vehicle in an extensive image gallery through its appearances from various views in the cross-camera scenario.It has gradually become a core technology of intelligent transportation system.Most existing vehicle re-identification models adopt the joint learning of global and local features.However,they directly use the extracted global features,resulting in insufficient feature expression.Moreover,local features are primarily obtained through advanced annotation and complex attention mechanisms,which require additional costs.To solve this issue,a multi-feature learning model with enhanced local attention for vehicle re-identification(MFELA)is proposed in this paper.The model consists of global and local branches.The global branch utilizes both middle and highlevel semantic features of ResNet50 to enhance the global representation capability.In addition,multi-scale pooling operations are used to obtain multiscale information.While the local branch utilizes the proposed Region Batch Dropblock(RBD),which encourages the model to learn discriminative features for different local regions and simultaneously drops corresponding same areas randomly in a batch during training to enhance the attention to local regions.Then features from both branches are combined to provide a more comprehensive and distinctive feature representation.Extensive experiments on VeRi-776 and VehicleID datasets prove that our method has excellent performance.展开更多
文摘Aiming at the problem that the existing pedestrian recognition technology re-identification effect is not good and the traditional method has low recognition effect. A feature fusion network is proposed in this paper, which combines the CNN features extracted by ResNet with the manual annotation attributes into a unified feature space. ResNet solved the problem of network degradation and multi-convergence in multi-layer CNN training, and extracted deeper features. The attribute combination method was adopted by the artificial annotation attributes. The CNN features were constrained by the hand-crafted features because of the back propagation. Then the loss measurement function was used to optimize network identification results. In the public datasets VIPeR, PRID, and CUHK for further testing, the experimental results show that the method achieves a high cumulative matching score.
基金supported in part by the Natural Science Foundation of Xinjiang Uygur Autonomous Region(Grant No.2022D01B186 and No.2022D01B05)。
文摘The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%.
文摘In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or explicitly constructing an appropriate intermediate domain to enhance recognition capability on the target domain.Implicit construction is difficult due to the absence of intermediate state supervision,making smooth knowledge transfer from the source to the target domain a challenge.To explicitly construct the most suitable intermediate domain for the model to gradually adapt to the feature distribution changes from the source to the target domain,we propose the Minimal Transfer Cost Framework(MTCF).MTCF considers all scenarios of the intermediate domain during the transfer process,ensuring smoother and more efficient domain alignment.Our framework mainly includes threemodules:Intermediate Domain Generator(IDG),Cross-domain Feature Constraint Module(CFCM),and Residual Channel Space Module(RCSM).First,the IDG Module is introduced to generate all possible intermediate domains,ensuring a smooth transition of knowledge fromthe source to the target domain.To reduce the cross-domain feature distribution discrepancy,we propose the CFCM Module,which quantifies the difficulty of knowledge transfer and ensures the diversity of intermediate domain features and their semantic relevance,achieving alignment between the source and target domains by incorporating mutual information and maximum mean discrepancy.We also design the RCSM,which utilizes attention mechanism to enhance the model’s focus on personnel features in low-resolution images,improving the accuracy and efficiency of person re-ID.Our proposed method outperforms existing technologies in all common UDA re-ID tasks and improves the Mean Average Precision(mAP)by 2.3%in the Market to Duke task compared to the state-of-the-art(SOTA)methods.
基金supported in part by the National Natural Science Foundation of China under Grant 62177029,62307025in part by the Startup Foundation for Introducing Talent of Nanjing University of Posts and Communications under Grant NY221041in part by the General Project of The Natural Science Foundation of Jiangsu Higher Education Institution of China 22KJB520025,23KJD580.
文摘Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared images for video surveillance,which poses a challenge in exploring cross-modal shared information accurately and efficiently.Therefore,multi-granularity feature learning methods have been applied in VI-ReID to extract potential multi-granularity semantic information related to pedestrian body structure attributes.However,existing research mainly uses traditional dual-stream fusion networks and overlooks the core of cross-modal learning networks,the fusion module.This paper introduces a novel network called the Augmented Deep Multi-Granularity Pose-Aware Feature Fusion Network(ADMPFF-Net),incorporating the Multi-Granularity Pose-Aware Feature Fusion(MPFF)module to generate discriminative representations.MPFF efficiently explores and learns global and local features with multi-level semantic information by inserting disentangling and duplicating blocks into the fusion module of the backbone network.ADMPFF-Net also provides a new perspective for designing multi-granularity learning networks.By incorporating the multi-granularity feature disentanglement(mGFD)and posture information segmentation(pIS)strategies,it extracts more representative features concerning body structure information.The Local Information Enhancement(LIE)module augments high-performance features in VI-ReID,and the multi-granularity joint loss supervises model training for objective feature learning.Experimental results on two public datasets show that ADMPFF-Net efficiently constructs pedestrian feature representations and enhances the accuracy of VI-ReID.
基金supported by the the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20211284the Financial and Science Technology Plan Project of Xinjiang Production and Construction Corps under Grant No.2020DB005.
文摘Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropriate model training solution.However,the required massive personal data for training contain private information with a significant risk of data leakage in cloud environments,leading to significant communication overheads.This paper proposes a federated person ReID method with model-contrastive learning(MOON)in an edge-cloud environment,named FRM.Specifically,based on federated partial averaging,MOON warmup is added to correct the local training of individual edge servers and improve the model’s effectiveness by calculating and back-propagating a model-contrastive loss,which represents the similarity between local and global models.In addition,we propose a lightweight person ReID network,named multi-branch combined depth space network(MB-CDNet),to reduce the computing resource usage of the edge device when training and testing the person ReID model.MB-CDNet is a multi-branch version of combined depth space network(CDNet).We add a part branch and a global branch on the basis of CDNet and introduce an attention pyramid to improve the performance of the model.The experimental results on open-access person ReID datasets demonstrate that FRM achieves better performance than existing baseline.
基金supported by the National Natural Science Foundation of China(61471154,61876057)the Key Research and Development Program of Anhui Province-Special Project of Strengthening Science and Technology Police(202004D07020012).
文摘Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a sufficiently high resolution,yet such models are not applicable to the open world.In real world,the changing distance between pedestrians and the camera renders the resolution of pedestrians captured by the camera inconsistent.When low-resolution(LR)images in the query set are matched with high-resolution(HR)images in the gallery set,it degrades the performance of the pedestrian matching task due to the absent pedestrian critical information in LR images.To address the above issues,we present a dualstream coupling network with wavelet transform(DSCWT)for the cross-resolution person re-identification task.Firstly,we use the multi-resolution analysis principle of wavelet transform to separately process the low-frequency and high-frequency regions of LR images,which is applied to restore the lost detail information of LR images.Then,we devise a residual knowledge constrained loss function that transfers knowledge between the two streams of LR images and HR images for accessing pedestrian invariant features at various resolutions.Extensive qualitative and quantitative experiments across four benchmark datasets verify the superiority of the proposed approach.
基金supported by the National Natural Science Foundation of China (Grant No.61906168,62202429)Zhejiang Provincial Natural Science Foundation of China (Grant No.LY23F020023)Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects (2022SDSJ01).
文摘Cross-modality pedestrian re-identification has important appli-cations in the field of surveillance.Due to variations in posture,camera per-spective,and camera modality,some salient pedestrian features are difficult to provide effective retrieval cues.Therefore,it becomes a challenge to design an effective strategy to extract more discriminative pedestrian detail.Although many effective methods for detailed feature extraction are proposed,there are still some shortcomings in filtering background and modality noise.To further purify the features,a pure detail feature extraction network(PDFENet)is proposed for VI-ReID.PDFENet includes three modules,adaptive detail mask generation module(ADMG),inter-detail interaction module(IDI)and cross-modality cross-entropy(CMCE).ADMG and IDI use human joints and their semantic associations to suppress background noise in features.CMCE guides the model to ignore modality noise by generating modality-shared feature labels.Specifically,ADMG generates masks for pedestrian details based on pose estimation.Masks are used to suppress background information and enhance pedestrian detail information.Besides,IDI mines the semantic relations among details to further refine the features.Finally,CMCE cross-combines classifiers and features to generate modality-shared feature labels to guide model training.Extensive ablation experiments as well as visualization results have demonstrated the effectiveness of PDFENet in eliminating background and modality noise.In addition,comparison experi-ments in two publicly available datasets also show the competitiveness of our approach.
基金Supported by the National Natural Science Foundation of China (No.61976098)the Natural Science Foundation for Outstanding Young Scholars of Fujian Province (No.2022J06023)。
文摘Visible-infrared person re-identification(VIPR), is a cross-modal retrieval task that searches a target from a gallery captured by cameras of different spectrums.The severe challenge for VIPR is the large intra-class variation caused by the modal discrepancy between visible and infrared images.For that, this paper proposes a query related cluster(QRC) method for VIPR.Firstly, this paper uses an attention mechanism to calculate the similarity relation between a visible query and infrared images with the same identity in the gallery.Secondly, those infrared images with the same query images are aggregated by using the similarity relation to form a dynamic clustering center corresponding to the query image.Thirdly, QRC loss function is designed to enlarge the similarity between the query image and its dynamic cluster center to achieve query related clustering, so as to compact the intra-class variations.Consequently, in the proposed QRC method, each query has its own dynamic clustering center, which can well characterize intra-class variations in VIPR.Experimental results demonstrate that the proposed QRC method is superior to many state-of-the-art approaches, acquiring a 90.77% rank-1 identification rate on the RegDB dataset.
文摘Person re-identification(ReID)is a sub-problem under image retrieval.It is a technology that uses computer vision to identify a specific pedestrian in a collection of pictures or videos.The pedestrian image under cross-device is taken from a monitored pedestrian image.At present,most ReID methods deal with the matching between visible and visible images,but with the continuous improvement of security monitoring system,more and more infrared cameras are used to monitor at night or in dim light.Due to the image differences between infrared camera and RGB camera,there is a huge visual difference between cross-modality images,so the traditional ReID method is difficult to apply in this scene.In view of this situation,studying the pedestrian matching between visible and infrared modalities is particularly crucial.Visible-infrared person re-identification(VI-ReID)was first proposed in 2017,and then attracted more and more attention,and many advanced methods emerged.
基金National Key Research and Development Program(2023YFD1301801)National Natural Science Foundation of China(32272931)+1 种基金Shaanxi Province Agricultural Key Core Technology Project(2024NYGG005)Shaanxi Province Key R&D Program(2024NC-ZDCYL-05-12)。
文摘Accurate and continuous identification of individual cattle is crucial to precision farming in recent years.It is also the prerequisite to monitor the individual feed intake and feeding time of beef cattle at medium to long distances over different cameras.However,beef cattle can tend to frequently move and change their feeding position during feeding.Furthermore,the great variations in their head direction and complex environments(light,occlusion,and background)can also lead to some difficulties in the recognition,particularly for the bio-similarities among individual cattle.Among them,AlignedReID++model is characterized by both global and local information for image matching.In particular,the dynamically matching local information(DMLI)algorithm has been introduced into the local branch to automatically align the horizontal local information.In this research,the AlignedReID++model was utilized and improved to achieve the better performance in cattle re-identification(ReID).Initially,triplet attention(TA)modules were integrated into the BottleNecks of ResNet50 Backbone.The feature extraction was then enhanced through cross-dimensional interactions with the minimal computational overhead.Since the TA modules in AlignedReID++baseline model increased the model size and floating point operations(FLOPs)by 0.005 M and 0.05 G,the rank-1 accuracy and mean average precision(mAP)were improved by 1.0 percentage points and 2.94 percentage points,respectively.Specifically,the rank-1 accuracies were outperformed by 0.86 percentage points and 0.12 percentage points,respectively,compared with the convolution block attention module(CBAM)and efficient channel attention(ECA)modules,although 0.94 percentage points were lower than that of squeeze-and-excitation(SE)modules.The mAP metric values were exceeded by 0.22,0.86 and 0.12 percentage points,respectively,compared with the SE,CBAM,and ECA modules.Additionally,the Cross-Entropy Loss function was replaced with the CosFace Loss function in the global branch of baseline model.CosFace Loss and Hard Triplet Loss were jointly employed to train the baseline model for the better identification on the similar individuals.AlignedReID++with CosFace Loss was outperformed the baseline model by 0.24 and 0.92 percentage points in the rank-1 accuracy and mAP,respectively,whereas,AlignedReID++with ArcFace Loss was exceeded by 0.36 and 0.56 percentage points,respectively.The improved model with the TA modules and CosFace Loss was achieved in a rank-1 accuracy of 94.42%,rank-5 accuracy of 98.78%,rank-10 accuracy of 99.34%,mAP of 63.90%,FLOPs of 5.45 G,frames per second(FPS)of 5.64,and model size of 23.78 M.The rank-1 accuracies were exceeded by 1.84,4.72,0.76 and 5.36 percentage points,respectively,compared with the baseline model,part-based convolutional baseline(PCB),multiple granularity network(MGN),and relation-aware global attention(RGA),while the mAP metrics were surpassed 6.42,5.86,4.30 and 7.38 percentage points,respectively.Meanwhile,the rank-1 accuracy was 0.98 percentage points lower than TransReID,but the mAP metric was exceeded by 3.90 percentage points.Moreover,the FLOPs of improved model were only 0.05 G larger than that of baseline model,while smaller than those of PCB,MGN,RGA,and TransReID by 0.68,6.51,25.4,and 16.55 G,respectively.The model size of improved model was 23.78 M,which was smaller than those of the baseline model,PCB,MGN,RGA,and TransReID by 0.03,2.33,45.06,14.53 and 62.85 M,respectively.The inference speed of improved model on a CPU was lower than those of PCB,MGN,and baseline model,but higher than TransReID and RGA.The t-SNE feature embedding visualization demonstrated that the global and local features were achieve in the better intra-class compactness and inter-class variability.Therefore,the improved model can be expected to effectively re-identify the beef cattle in natural environments of breeding farm,in order to monitor the individual feed intake and feeding time.
基金Project supported by the National Natural Science Foundation of China(Grant No.71603146).
文摘Pedestrian self-organizing movement plays a significant role in evacuation studies and architectural design.Lane formation,a typical self-organizing phenomenon,helps pedestrian system to become more orderly,the majority of following behavior model and overtaking behavior model are imprecise and unrealistic compared with pedestrian movement in the real world.In this study,a pedestrian dynamic model considering detailed modelling of the following behavior and overtaking behavior is constructed,and a method of measuring the lane formation and pedestrian system order based on information entropy is proposed.Simulation and analysis demonstrate that the following and avoidance behaviors are important factors of lane formation.A high tendency of following results in good lane formation.Both non-selective following behavior and aggressive overtaking behavior cause the system order to decrease.The most orderly following strategy for a pedestrian is to overtake the former pedestrian whose speed is lower than approximately 70%of his own.The influence of the obstacle layout on pedestrian lane and egress efficiency is also studied with this model.The presence of a small obstacle does not obstruct the walking of pedestrians;in contrast,it may help to improve the egress efficiency by guiding the pedestrian flow and mitigating the reduction of pedestrian system orderliness.
文摘Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-identification(Re-ID)in video surveillance system can track and identify suspicious people,track and statistically analyze persons.The purpose of person re-identification is to recognize the same person in different cameras.Deep learning-based person re-identification research has produced numerous remarkable outcomes as a result of deep learning's growing popularity.The purpose of this paperis to help researchers better understand where person re-identification research is at the moment and where it is headed.Firstly,this paper arranges the widely used datasets and assessment criteria in person re-identification and reviews the pertinent research on deep learning-based person re-identification techniques conducted in the last several years.Then,the commonly used method techniques are also discussed from four aspects:appearance features,metric learning,local features,and adversarial learning.Finally,future research directions in the field of person re-identification are outlooked.
文摘Walkability is an essential aspect of urban transportation systems. Properly designed walking paths can enhance transportation safety, encourage pedestrian activity, and improve community quality of life. This, in turn, can help achieve sustainable development goals in urban areas. This pilot study uses wearable technology data to present a new method for measuring pedestrian stress in urban environments and the results were presented as an interactive geographic information system map to support risk-informed decision-making. The approach involves analyzing data from wearable devices using heart rate variability (RMSSD and slope analysis) to identify high-stress locations. This data-driven approach can help urban planners and safety experts identify and address pedestrian stressors, ultimately creating safer, more walkable cities. The study addresses a significant challenge in pedestrian safety by providing insights into factors and locations that trigger stress in pedestrians. During the pilot study, high-stress pedestrian experiences were identified due to issues like pedestrian-scooter interaction on pedestrian paths, pedestrian behavior around high foot traffic areas, and poor visibility at pedestrian crossings due to inadequate lighting.
基金supported by the National Natural Science Foundation of China under(Grant No.52175531)in part by the Science and Technology Research Program of Chongqing Municipal Education Commission under Grant(Grant Nos.KJQN202000605 and KJZD-M202000602)。
文摘Pedestrian positioning system(PPS)using wearable inertial sensors has wide applications towards various emerging fields such as smart healthcare,emergency rescue,soldier positioning,etc.The performance of traditional PPS is limited by the cumulative error of inertial sensors,complex motion modes of pedestrians,and the low robustness of the multi-sensor collaboration structure.This paper presents a hybrid pedestrian positioning system using the combination of wearable inertial sensors and ultrasonic ranging(H-PPS).A robust two nodes integration structure is developed to adaptively combine the motion data acquired from the single waist-mounted and foot-mounted node,and enhanced by a novel ellipsoid constraint model.In addition,a deep-learning-based walking speed estimator is proposed by considering all the motion features provided by different nodes,which effectively reduces the cumulative error originating from inertial sensors.Finally,a comprehensive data and model dual-driven model is presented to effectively combine the motion data provided by different sensor nodes and walking speed estimator,and multi-level constraints are extracted to further improve the performance of the overall system.Experimental results indicate that the proposed H-PPS significantly improves the performance of the single PPS and outperforms existing algorithms in accuracy index under complex indoor scenarios.
文摘Road traffic safety can decrease when drivers drive in a low-visibility environment.The application of visual perception technology to detect vehicles and pedestrians in infrared images proves to be an effective means of reducing the risk of accidents.To tackle the challenges posed by the low recognition accuracy and the substan-tial computational burden associated with current infrared pedestrian-vehicle detection methods,an infrared pedestrian-vehicle detection method A proposal is presented,based on an enhanced version of You Only Look Once version 5(YOLOv5).First,A head specifically designed for detecting small targets has been integrated into the model to make full use of shallow feature information to enhance the accuracy in detecting small targets.Second,the Focal Generalized Intersection over Union(GIoU)is employed as an alternative to the original loss function to address issues related to target overlap and category imbalance.Third,the distribution shift convolution optimization feature extraction operator is used to alleviate the computational burden of the model without significantly compromising detection accuracy.The test results of the improved algorithm show that its average accuracy(mAP)reaches 90.1%.Specifically,the Giga Floating Point Operations Per second(GFLOPs)of the improved algorithm is only 9.1.In contrast,the improved algorithms outperformed the other algorithms on similar GFLOPs,such as YOLOv6n(11.9),YOLOv8n(8.7),YOLOv7t(13.2)and YOLOv5s(16.0).The mAPs that are 4.4%,3%,3.5%,and 1.7%greater than those of these algorithms show that the improved algorithm achieves higher accuracy in target detection tasks under similar computational resource overhead.On the other hand,compared with other algorithms such as YOLOv8l(91.1%),YOLOv6l(89.5%),YOLOv7(90.8%),and YOLOv3(90.1%),the improved algorithm needs only 5.5%,2.3%,8.6%,and 2.3%,respectively,of the GFLOPs.The improved algorithm has shown significant advancements in balancing accuracy and computational efficiency,making it promising for practical use in resource-limited scenarios.
文摘With the development of positioning technology,loca-tion services are constantly in demand by people.As a primary location service pedestrian navigation has two main approaches based on radio and inertial navigation.The pedestrian naviga-tion based on radio is subject to environmental occlusion lead-ing to the degradation of positioning accuracy.The pedestrian navigation based on micro-electro-mechanical system inertial measurement unit(MIMU)is less susceptible to environmental interference,but its errors dissipate over time.In this paper,a chest card pedestrian navigation improvement method based on complementary correction is proposed in order to suppress the error divergence of inertial navigation methods.To suppress atti-tude errors,optimal feedback coefficients are established by pedestrian motion characteristics.To extend navigation time and improve positioning accuracy,the step length in subsequent movements is compensated by the first step length.The experi-mental results show that the positioning accuracy of the pro-posed method is improved by more than 47%and 44%com-pared with the pure inertia-based method combined with step compensation and the traditional complementary filtering com-bined method with step compensation.The proposed method can effectively suppress the error dispersion and improve the positioning accuracy.
基金supported by the Henan Provincial Science and Technology Research Project under Grants 232102211006,232102210044,232102211017,232102210055 and 222102210214the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205+1 种基金the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant Jiao Gao[2021]No.489-29the Doctor Natural Science Foundation of Zhengzhou University of Light Industry under Grants 2021BSJJ025 and 2022BSJJZK13.
文摘Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.
文摘This study explores the challenges posed by pedestrian detection and occlusion in AR applications, employing a novel approach that utilizes RGB-D-based skeleton reconstruction to reduce the overhead of classical pedestrian detection algorithms during training. Furthermore, it is dedicated to addressing occlusion issues in pedestrian detection by using Azure Kinect for body tracking and integrating a robust occlusion management algorithm, significantly enhancing detection efficiency. In experiments, an average latency of 204 milliseconds was measured, and the detection accuracy reached an outstanding level of 97%. Additionally, this approach has been successfully applied in creating a simple yet captivating augmented reality game, demonstrating the practical application of the algorithm.
文摘Traffic intersections are incredibly dangerous for drivers and pedestrians. Statistics from both Canada and the U.S. show a high number of fatalities and serious injuries related to crashes at intersections. In Canada, during 2019, the National Collision Database shows that 28% of traffic fatalities and 42% of serious injuries occurred at intersections. Likewise, the U.S. National Highway Traffic Administration (NHTSA) found that about 40% of the estimated 5,811,000 accidents in the U.S. during the year studied were intersection-related crashes. In fact, a major survey by the car insurance industry found that nearly 85% of drivers could not identify the correct action to take when approaching a yellow traffic light at an intersection. One major reason for these accidents is the “yellow light dilemma,” the ambiguous situation where a driver should stop or proceed forward when unexpectedly faced with a yellow light. This situation is even further exacerbated by the tendency of aggressive drivers to inappropriately speed up on the yellow just to get through the traffic light. A survey of Canadian drivers conducted by the Traffic Injury Research Foundation found that 9% of drivers admitted to speeding up to get through a traffic light. Another reason for these accidents is the increased danger of making a left-hand turn on yellow. According to the National Highway Traffic Safety Association (NHTSA), left turns occur in approximately 22.2% of collisions—as opposed to just 1.2% for right turns. Moreover, a study by CNN found left turns are three times as likely to kill pedestrians than right turns. The reason left turns are so much more likely to cause an accident is because they take a driver against traffic and in the path of oncoming cars. Additionally, most of these left turns occur at the driver’s discretion—as opposed to the distressingly brief left-hand arrow at busy intersections. Drive Safe Now proposes a workable solution for reducing the number of accidents occurring during a yellow light at intersections. We believe this fairly simple solution will save lives, prevent injuries, reduce damage to public and private property, and decrease insurance costs.
基金This work was supported,in part,by the National Nature Science Foundation of China under Grant Numbers 61502240,61502096,61304205,61773219in part,by the Natural Science Foundation of Jiangsu Province under grant numbers BK20201136,BK20191401+1 种基金in part,by the Postgraduate Research&Practice Innovation Program of Jiangsu Province under Grant Numbers SJCX21_0363in part,by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)fund.
文摘Vehicle re-identification(ReID)aims to retrieve the target vehicle in an extensive image gallery through its appearances from various views in the cross-camera scenario.It has gradually become a core technology of intelligent transportation system.Most existing vehicle re-identification models adopt the joint learning of global and local features.However,they directly use the extracted global features,resulting in insufficient feature expression.Moreover,local features are primarily obtained through advanced annotation and complex attention mechanisms,which require additional costs.To solve this issue,a multi-feature learning model with enhanced local attention for vehicle re-identification(MFELA)is proposed in this paper.The model consists of global and local branches.The global branch utilizes both middle and highlevel semantic features of ResNet50 to enhance the global representation capability.In addition,multi-scale pooling operations are used to obtain multiscale information.While the local branch utilizes the proposed Region Batch Dropblock(RBD),which encourages the model to learn discriminative features for different local regions and simultaneously drops corresponding same areas randomly in a batch during training to enhance the attention to local regions.Then features from both branches are combined to provide a more comprehensive and distinctive feature representation.Extensive experiments on VeRi-776 and VehicleID datasets prove that our method has excellent performance.