期刊文献+
共找到92,709篇文章
< 1 2 250 >
每页显示 20 50 100
Self-supervised recalibration network for person re-identification
1
作者 Shaoqi Hou Zhiming Wang +4 位作者 Zhihua Dong Ye Li Zhiguo Wang Guangqiang Yin Xinzhong Wang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期163-178,共16页
The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have ... The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%. 展开更多
关键词 Person re-identification Attention mechanism Global information Local information Adaptive weighted fusion
下载PDF
Enhancing Unsupervised Domain Adaptation for Person Re-Identification with the Minimal Transfer Cost Framework
2
作者 Sheng Xu Shixiong Xiang +1 位作者 Feiyu Meng Qiang Wu 《Computers, Materials & Continua》 SCIE EI 2024年第9期4197-4218,共22页
In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or expl... In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or explicitly constructing an appropriate intermediate domain to enhance recognition capability on the target domain.Implicit construction is difficult due to the absence of intermediate state supervision,making smooth knowledge transfer from the source to the target domain a challenge.To explicitly construct the most suitable intermediate domain for the model to gradually adapt to the feature distribution changes from the source to the target domain,we propose the Minimal Transfer Cost Framework(MTCF).MTCF considers all scenarios of the intermediate domain during the transfer process,ensuring smoother and more efficient domain alignment.Our framework mainly includes threemodules:Intermediate Domain Generator(IDG),Cross-domain Feature Constraint Module(CFCM),and Residual Channel Space Module(RCSM).First,the IDG Module is introduced to generate all possible intermediate domains,ensuring a smooth transition of knowledge fromthe source to the target domain.To reduce the cross-domain feature distribution discrepancy,we propose the CFCM Module,which quantifies the difficulty of knowledge transfer and ensures the diversity of intermediate domain features and their semantic relevance,achieving alignment between the source and target domains by incorporating mutual information and maximum mean discrepancy.We also design the RCSM,which utilizes attention mechanism to enhance the model’s focus on personnel features in low-resolution images,improving the accuracy and efficiency of person re-ID.Our proposed method outperforms existing technologies in all common UDA re-ID tasks and improves the Mean Average Precision(mAP)by 2.3%in the Market to Duke task compared to the state-of-the-art(SOTA)methods. 展开更多
关键词 Person re-identification unsupervised domain adaptation attention mechanism mutual information maximum mean discrepancy
下载PDF
Augmented Deep Multi-Granularity Pose-Aware Feature Fusion Network for Visible-Infrared Person Re-Identification 被引量:2
3
作者 Zheng Shi Wanru Song +1 位作者 Junhao Shan Feng Liu 《Computers, Materials & Continua》 SCIE EI 2023年第12期3467-3488,共22页
Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared ima... Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared images for video surveillance,which poses a challenge in exploring cross-modal shared information accurately and efficiently.Therefore,multi-granularity feature learning methods have been applied in VI-ReID to extract potential multi-granularity semantic information related to pedestrian body structure attributes.However,existing research mainly uses traditional dual-stream fusion networks and overlooks the core of cross-modal learning networks,the fusion module.This paper introduces a novel network called the Augmented Deep Multi-Granularity Pose-Aware Feature Fusion Network(ADMPFF-Net),incorporating the Multi-Granularity Pose-Aware Feature Fusion(MPFF)module to generate discriminative representations.MPFF efficiently explores and learns global and local features with multi-level semantic information by inserting disentangling and duplicating blocks into the fusion module of the backbone network.ADMPFF-Net also provides a new perspective for designing multi-granularity learning networks.By incorporating the multi-granularity feature disentanglement(mGFD)and posture information segmentation(pIS)strategies,it extracts more representative features concerning body structure information.The Local Information Enhancement(LIE)module augments high-performance features in VI-ReID,and the multi-granularity joint loss supervises model training for objective feature learning.Experimental results on two public datasets show that ADMPFF-Net efficiently constructs pedestrian feature representations and enhances the accuracy of VI-ReID. 展开更多
关键词 Visible-infrared person re-identification MULTI-GRANULARITY feature learning modality
下载PDF
Person Re-Identification with Model-Contrastive Federated Learning in Edge-Cloud Environment 被引量:1
4
作者 Baixuan Tang Xiaolong Xu +1 位作者 Fei Dai Song Wang 《Intelligent Automation & Soft Computing》 2023年第10期35-55,共21页
Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropria... Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropriate model training solution.However,the required massive personal data for training contain private information with a significant risk of data leakage in cloud environments,leading to significant communication overheads.This paper proposes a federated person ReID method with model-contrastive learning(MOON)in an edge-cloud environment,named FRM.Specifically,based on federated partial averaging,MOON warmup is added to correct the local training of individual edge servers and improve the model’s effectiveness by calculating and back-propagating a model-contrastive loss,which represents the similarity between local and global models.In addition,we propose a lightweight person ReID network,named multi-branch combined depth space network(MB-CDNet),to reduce the computing resource usage of the edge device when training and testing the person ReID model.MB-CDNet is a multi-branch version of combined depth space network(CDNet).We add a part branch and a global branch on the basis of CDNet and introduce an attention pyramid to improve the performance of the model.The experimental results on open-access person ReID datasets demonstrate that FRM achieves better performance than existing baseline. 展开更多
关键词 Person re-identification federated learning contrastive learning
下载PDF
Dual-stream coupling network with wavelet transform for cross-resolution person re-identification
5
作者 SUN Rui YANG Zi +1 位作者 ZHAO Zhenghui ZHANG Xudong 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第3期682-695,共14页
Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a... Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a sufficiently high resolution,yet such models are not applicable to the open world.In real world,the changing distance between pedestrians and the camera renders the resolution of pedestrians captured by the camera inconsistent.When low-resolution(LR)images in the query set are matched with high-resolution(HR)images in the gallery set,it degrades the performance of the pedestrian matching task due to the absent pedestrian critical information in LR images.To address the above issues,we present a dualstream coupling network with wavelet transform(DSCWT)for the cross-resolution person re-identification task.Firstly,we use the multi-resolution analysis principle of wavelet transform to separately process the low-frequency and high-frequency regions of LR images,which is applied to restore the lost detail information of LR images.Then,we devise a residual knowledge constrained loss function that transfers knowledge between the two streams of LR images and HR images for accessing pedestrian invariant features at various resolutions.Extensive qualitative and quantitative experiments across four benchmark datasets verify the superiority of the proposed approach. 展开更多
关键词 cross-resolution feature invariant learning person re-identification residual knowledge transfer wavelet transform
下载PDF
Pure Detail Feature Extraction Network for Visible-Infrared Re-Identification
6
作者 Jiaao Cui Sixian Chan +2 位作者 Pan Mu Tinglong Tang Xiaolong Zhou 《Intelligent Automation & Soft Computing》 SCIE 2023年第8期2263-2277,共15页
Cross-modality pedestrian re-identification has important appli-cations in the field of surveillance.Due to variations in posture,camera per-spective,and camera modality,some salient pedestrian features are difficult ... Cross-modality pedestrian re-identification has important appli-cations in the field of surveillance.Due to variations in posture,camera per-spective,and camera modality,some salient pedestrian features are difficult to provide effective retrieval cues.Therefore,it becomes a challenge to design an effective strategy to extract more discriminative pedestrian detail.Although many effective methods for detailed feature extraction are proposed,there are still some shortcomings in filtering background and modality noise.To further purify the features,a pure detail feature extraction network(PDFENet)is proposed for VI-ReID.PDFENet includes three modules,adaptive detail mask generation module(ADMG),inter-detail interaction module(IDI)and cross-modality cross-entropy(CMCE).ADMG and IDI use human joints and their semantic associations to suppress background noise in features.CMCE guides the model to ignore modality noise by generating modality-shared feature labels.Specifically,ADMG generates masks for pedestrian details based on pose estimation.Masks are used to suppress background information and enhance pedestrian detail information.Besides,IDI mines the semantic relations among details to further refine the features.Finally,CMCE cross-combines classifiers and features to generate modality-shared feature labels to guide model training.Extensive ablation experiments as well as visualization results have demonstrated the effectiveness of PDFENet in eliminating background and modality noise.In addition,comparison experi-ments in two publicly available datasets also show the competitiveness of our approach. 展开更多
关键词 Person re-identification multimedia image retrieval
下载PDF
Enhancing Dense Small Object Detection in UAV Images Based on Hybrid Transformer 被引量:1
7
作者 Changfeng Feng Chunping Wang +2 位作者 Dongdong Zhang Renke Kou Qiang Fu 《Computers, Materials & Continua》 SCIE EI 2024年第3期3993-4013,共21页
Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unman... Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection. 展开更多
关键词 UAV images TRANSFORMER dense small object detection
下载PDF
Masked Autoencoders as Single Object Tracking Learners 被引量:1
8
作者 Chunjuan Bo XinChen Junxing Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第7期1105-1122,共18页
Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of ... Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of such trackers heavily relies on ViT models pretrained for long periods,limitingmore flexible model designs for tracking tasks.To address this issue,we propose an efficient unsupervised ViT pretraining method for the tracking task based on masked autoencoders,called TrackMAE.During pretraining,we employ two shared-parameter ViTs,serving as the appearance encoder and motion encoder,respectively.The appearance encoder encodes randomly masked image data,while the motion encoder encodes randomly masked pairs of video frames.Subsequently,an appearance decoder and a motion decoder separately reconstruct the original image data and video frame data at the pixel level.In this way,ViT learns to understand both the appearance of images and the motion between video frames simultaneously.Experimental results demonstrate that ViT-Base and ViT-Large models,pretrained with TrackMAE and combined with a simple tracking head,achieve state-of-the-art(SOTA)performance without additional design.Moreover,compared to the currently popular MAE pretraining methods,TrackMAE consumes only 1/5 of the training time,which will facilitate the customization of diverse models for tracking.For instance,we additionally customize a lightweight ViT-XS,which achieves SOTA efficient tracking performance. 展开更多
关键词 Visual object tracking vision transformer masked autoencoder visual representation learning
下载PDF
Visible-infrared person re-identification using query related cluster
9
作者 赵倩倩 WU Hanxiao +2 位作者 HUANG Linhan ZHU Jianqing ZENG Huanqiang 《High Technology Letters》 EI CAS 2023年第2期194-205,共12页
Visible-infrared person re-identification(VIPR), is a cross-modal retrieval task that searches a target from a gallery captured by cameras of different spectrums.The severe challenge for VIPR is the large intra-class ... Visible-infrared person re-identification(VIPR), is a cross-modal retrieval task that searches a target from a gallery captured by cameras of different spectrums.The severe challenge for VIPR is the large intra-class variation caused by the modal discrepancy between visible and infrared images.For that, this paper proposes a query related cluster(QRC) method for VIPR.Firstly, this paper uses an attention mechanism to calculate the similarity relation between a visible query and infrared images with the same identity in the gallery.Secondly, those infrared images with the same query images are aggregated by using the similarity relation to form a dynamic clustering center corresponding to the query image.Thirdly, QRC loss function is designed to enlarge the similarity between the query image and its dynamic cluster center to achieve query related clustering, so as to compact the intra-class variations.Consequently, in the proposed QRC method, each query has its own dynamic clustering center, which can well characterize intra-class variations in VIPR.Experimental results demonstrate that the proposed QRC method is superior to many state-of-the-art approaches, acquiring a 90.77% rank-1 identification rate on the RegDB dataset. 展开更多
关键词 query related cluster(QRC) cross-modality visible-infrared person re-identification(VIPR) video surveillance
下载PDF
Review of Visible-Infrared Cross-Modality Person Re-Identification
10
作者 Yinyin Zhang 《Journal of New Media》 2023年第1期23-31,共9页
Person re-identification(ReID)is a sub-problem under image retrieval.It is a technology that uses computer vision to identify a specific pedestrian in a collection of pictures or videos.The pedestrian image under cros... Person re-identification(ReID)is a sub-problem under image retrieval.It is a technology that uses computer vision to identify a specific pedestrian in a collection of pictures or videos.The pedestrian image under cross-device is taken from a monitored pedestrian image.At present,most ReID methods deal with the matching between visible and visible images,but with the continuous improvement of security monitoring system,more and more infrared cameras are used to monitor at night or in dim light.Due to the image differences between infrared camera and RGB camera,there is a huge visual difference between cross-modality images,so the traditional ReID method is difficult to apply in this scene.In view of this situation,studying the pedestrian matching between visible and infrared modalities is particularly crucial.Visible-infrared person re-identification(VI-ReID)was first proposed in 2017,and then attracted more and more attention,and many advanced methods emerged. 展开更多
关键词 Person re-identification cross-modality
下载PDF
Re-identifying beef cattle using improved AlignedReID++
11
作者 YING Xiaoyi ZHAO Jizheng +7 位作者 YANG Lingling ZHOU Xinyi WANG Lei GAO Yannian ZAN Linsen YANG Wucai LIU Han SONG Huaibo 《农业工程学报》 EI CAS CSCD 北大核心 2024年第18期132-146,共15页
Accurate and continuous identification of individual cattle is crucial to precision farming in recent years.It is also the prerequisite to monitor the individual feed intake and feeding time of beef cattle at medium t... Accurate and continuous identification of individual cattle is crucial to precision farming in recent years.It is also the prerequisite to monitor the individual feed intake and feeding time of beef cattle at medium to long distances over different cameras.However,beef cattle can tend to frequently move and change their feeding position during feeding.Furthermore,the great variations in their head direction and complex environments(light,occlusion,and background)can also lead to some difficulties in the recognition,particularly for the bio-similarities among individual cattle.Among them,AlignedReID++model is characterized by both global and local information for image matching.In particular,the dynamically matching local information(DMLI)algorithm has been introduced into the local branch to automatically align the horizontal local information.In this research,the AlignedReID++model was utilized and improved to achieve the better performance in cattle re-identification(ReID).Initially,triplet attention(TA)modules were integrated into the BottleNecks of ResNet50 Backbone.The feature extraction was then enhanced through cross-dimensional interactions with the minimal computational overhead.Since the TA modules in AlignedReID++baseline model increased the model size and floating point operations(FLOPs)by 0.005 M and 0.05 G,the rank-1 accuracy and mean average precision(mAP)were improved by 1.0 percentage points and 2.94 percentage points,respectively.Specifically,the rank-1 accuracies were outperformed by 0.86 percentage points and 0.12 percentage points,respectively,compared with the convolution block attention module(CBAM)and efficient channel attention(ECA)modules,although 0.94 percentage points were lower than that of squeeze-and-excitation(SE)modules.The mAP metric values were exceeded by 0.22,0.86 and 0.12 percentage points,respectively,compared with the SE,CBAM,and ECA modules.Additionally,the Cross-Entropy Loss function was replaced with the CosFace Loss function in the global branch of baseline model.CosFace Loss and Hard Triplet Loss were jointly employed to train the baseline model for the better identification on the similar individuals.AlignedReID++with CosFace Loss was outperformed the baseline model by 0.24 and 0.92 percentage points in the rank-1 accuracy and mAP,respectively,whereas,AlignedReID++with ArcFace Loss was exceeded by 0.36 and 0.56 percentage points,respectively.The improved model with the TA modules and CosFace Loss was achieved in a rank-1 accuracy of 94.42%,rank-5 accuracy of 98.78%,rank-10 accuracy of 99.34%,mAP of 63.90%,FLOPs of 5.45 G,frames per second(FPS)of 5.64,and model size of 23.78 M.The rank-1 accuracies were exceeded by 1.84,4.72,0.76 and 5.36 percentage points,respectively,compared with the baseline model,part-based convolutional baseline(PCB),multiple granularity network(MGN),and relation-aware global attention(RGA),while the mAP metrics were surpassed 6.42,5.86,4.30 and 7.38 percentage points,respectively.Meanwhile,the rank-1 accuracy was 0.98 percentage points lower than TransReID,but the mAP metric was exceeded by 3.90 percentage points.Moreover,the FLOPs of improved model were only 0.05 G larger than that of baseline model,while smaller than those of PCB,MGN,RGA,and TransReID by 0.68,6.51,25.4,and 16.55 G,respectively.The model size of improved model was 23.78 M,which was smaller than those of the baseline model,PCB,MGN,RGA,and TransReID by 0.03,2.33,45.06,14.53 and 62.85 M,respectively.The inference speed of improved model on a CPU was lower than those of PCB,MGN,and baseline model,but higher than TransReID and RGA.The t-SNE feature embedding visualization demonstrated that the global and local features were achieve in the better intra-class compactness and inter-class variability.Therefore,the improved model can be expected to effectively re-identify the beef cattle in natural environments of breeding farm,in order to monitor the individual feed intake and feeding time. 展开更多
关键词 method IDENTIFY beef cattle precision livestock re-identification AlignedReID++ deep learning
下载PDF
Enhanced Object Detection and Classification via Multi-Method Fusion
12
作者 Muhammad Waqas Ahmed Nouf Abdullah Almujally +2 位作者 Abdulwahab Alazeb Asaad Algarni Jeongmin Park 《Computers, Materials & Continua》 SCIE EI 2024年第5期3315-3331,共17页
Advances in machine vision systems have revolutionized applications such as autonomous driving,robotic navigation,and augmented reality.Despite substantial progress,challenges persist,including dynamic backgrounds,occ... Advances in machine vision systems have revolutionized applications such as autonomous driving,robotic navigation,and augmented reality.Despite substantial progress,challenges persist,including dynamic backgrounds,occlusion,and limited labeled data.To address these challenges,we introduce a comprehensive methodology toenhance image classification and object detection accuracy.The proposed approach involves the integration ofmultiple methods in a complementary way.The process commences with the application of Gaussian filters tomitigate the impact of noise interference.These images are then processed for segmentation using Fuzzy C-Meanssegmentation in parallel with saliency mapping techniques to find the most prominent regions.The Binary RobustIndependent Elementary Features(BRIEF)characteristics are then extracted fromdata derived fromsaliency mapsand segmented images.For precise object separation,Oriented FAST and Rotated BRIEF(ORB)algorithms areemployed.Genetic Algorithms(GAs)are used to optimize Random Forest classifier parameters which lead toimproved performance.Our method stands out due to its comprehensive approach,adeptly addressing challengessuch as changing backdrops,occlusion,and limited labeled data concurrently.A significant enhancement hasbeen achieved by integrating Genetic Algorithms(GAs)to precisely optimize parameters.This minor adjustmentnot only boosts the uniqueness of our system but also amplifies its overall efficacy.The proposed methodologyhas demonstrated notable classification accuracies of 90.9%and 89.0%on the challenging Corel-1k and MSRCdatasets,respectively.Furthermore,detection accuracies of 87.2%and 86.6%have been attained.Although ourmethod performed well in both datasets it may face difficulties in real-world data especially where datasets havehighly complex backgrounds.Despite these limitations,GAintegration for parameter optimization shows a notablestrength in enhancing the overall adaptability and performance of our system. 展开更多
关键词 BRIEF features saliency map fuzzy c-means object detection object recognition
下载PDF
Confusing Object Detection:A Survey
13
作者 Kunkun Tong Guchu Zou +5 位作者 Xin Tan Jingyu Gong Zhenyi Qi Zhizhong Zhang Yuan Xie Lizhuang Ma 《Computers, Materials & Continua》 SCIE EI 2024年第9期3421-3461,共41页
Confusing object detection(COD),such as glass,mirrors,and camouflaged objects,represents a burgeoning visual detection task centered on pinpointing and distinguishing concealed targets within intricate backgrounds,lev... Confusing object detection(COD),such as glass,mirrors,and camouflaged objects,represents a burgeoning visual detection task centered on pinpointing and distinguishing concealed targets within intricate backgrounds,leveraging deep learning methodologies.Despite garnering increasing attention in computer vision,the focus of most existing works leans toward formulating task-specific solutions rather than delving into in-depth analyses of methodological structures.As of now,there is a notable absence of a comprehensive systematic review that focuses on recently proposed deep learning-based models for these specific tasks.To fill this gap,our study presents a pioneering review that covers both themodels and the publicly available benchmark datasets,while also identifying potential directions for future research in this field.The current dataset primarily focuses on single confusing object detection at the image level,with some studies extending to video-level data.We conduct an in-depth analysis of deep learning architectures,revealing that the current state-of-the-art(SOTA)COD methods demonstrate promising performance in single object detection.We also compile and provide detailed descriptions ofwidely used datasets relevant to these detection tasks.Our endeavor extends to discussing the limitations observed in current methodologies,alongside proposed solutions aimed at enhancing detection accuracy.Additionally,we deliberate on relevant applications and outline future research trajectories,aiming to catalyze advancements in the field of glass,mirror,and camouflaged object detection. 展开更多
关键词 Confusing object detection mirror detection glass detection camouflaged object detection deep learning
下载PDF
Floating Waste Discovery by Request via Object-Centric Learning
14
作者 Bingfei Fu 《Computers, Materials & Continua》 SCIE EI 2024年第7期1407-1424,共18页
Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects an... Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects and the high cost associated with data collection.Consequently,devising algorithms capable of accurately localizing specific objects within a scene in scenarios where annotated data is limited remains a formidable challenge.To solve this problem,this paper proposes an object discovery by request problem setting and a corresponding algorithmic framework.The proposed problem setting aims to identify specified objects in scenes,and the associated algorithmic framework comprises pseudo data generation and object discovery by request network.Pseudo-data generation generates images resembling natural scenes through various data augmentation rules,using a small number of object samples and scene images.The network structure of object discovery by request utilizes the pre-trained Vision Transformer(ViT)model as the backbone,employs object-centric methods to learn the latent representations of foreground objects,and applies patch-level reconstruction constraints to the model.During the validation phase,we use the generated pseudo datasets as training sets and evaluate the performance of our model on the original test sets.Experiments have proved that our method achieves state-of-the-art performance on Unmanned Aerial Vehicles-Bottle Detection(UAV-BD)dataset and self-constructed dataset Bottle,especially in multi-object scenarios. 展开更多
关键词 Unsupervised object discovery object-centric learning pseudo data generation real-world object discovery by request
下载PDF
A Review of Research on Person Re-identification in Surveillance Video
15
作者 Yunzuo ZHANG Weiqi LIAN 《Mechanical Engineering Science》 2023年第2期1-7,共7页
Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-i... Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-identification(Re-ID)in video surveillance system can track and identify suspicious people,track and statistically analyze persons.The purpose of person re-identification is to recognize the same person in different cameras.Deep learning-based person re-identification research has produced numerous remarkable outcomes as a result of deep learning's growing popularity.The purpose of this paperis to help researchers better understand where person re-identification research is at the moment and where it is headed.Firstly,this paper arranges the widely used datasets and assessment criteria in person re-identification and reviews the pertinent research on deep learning-based person re-identification techniques conducted in the last several years.Then,the commonly used method techniques are also discussed from four aspects:appearance features,metric learning,local features,and adversarial learning.Finally,future research directions in the field of person re-identification are outlooked. 展开更多
关键词 Person re-identification Deep learning Metric learning Local features Adversarial learning
下载PDF
Two-Layer Attention Feature Pyramid Network for Small Object Detection
16
作者 Sheng Xiang Junhao Ma +2 位作者 Qunli Shang Xianbao Wang Defu Chen 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第10期713-731,共19页
Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain les... Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors. 展开更多
关键词 Small object detection two-layer attention module small object detail enhancement module feature pyramid network
下载PDF
Rail-Pillar Net:A 3D Detection Network for Railway Foreign Object Based on LiDAR
17
作者 Fan Li Shuyao Zhang +2 位作者 Jie Yang Zhicheng Feng Zhichao Chen 《Computers, Materials & Continua》 SCIE EI 2024年第9期3819-3833,共15页
Aiming at the limitations of the existing railway foreign object detection methods based on two-dimensional(2D)images,such as short detection distance,strong influence of environment and lack of distance information,w... Aiming at the limitations of the existing railway foreign object detection methods based on two-dimensional(2D)images,such as short detection distance,strong influence of environment and lack of distance information,we propose Rail-PillarNet,a three-dimensional(3D)LIDAR(Light Detection and Ranging)railway foreign object detection method based on the improvement of PointPillars.Firstly,the parallel attention pillar encoder(PAPE)is designed to fully extract the features of the pillars and alleviate the problem of local fine-grained information loss in PointPillars pillars encoder.Secondly,a fine backbone network is designed to improve the feature extraction capability of the network by combining the coding characteristics of LIDAR point cloud feature and residual structure.Finally,the initial weight parameters of the model were optimised by the transfer learning training method to further improve accuracy.The experimental results on the OSDaR23 dataset show that the average accuracy of Rail-PillarNet reaches 58.51%,which is higher than most mainstream models,and the number of parameters is 5.49 M.Compared with PointPillars,the accuracy of each target is improved by 10.94%,3.53%,16.96%and 19.90%,respectively,and the number of parameters only increases by 0.64M,which achieves a balance between the number of parameters and accuracy. 展开更多
关键词 Railway foreign object light detection and ranging(LiDAR) 3D object detection PointPillars parallel attention mechanism transfer learning
下载PDF
A Secure and Cost-Effective Training Framework Atop Serverless Computing for Object Detection in Blasting
18
作者 Tianming Zhang Zebin Chen +4 位作者 Haonan Guo Bojun Ren Quanmin Xie Mengke Tian Yong Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第5期2139-2154,共16页
The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection ... The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection in the field of blasting.Serverless Computing can provide a variety of computing services for people without hardware foundations and rich software development experience,which has aroused people’s interest in how to use it in the field ofmachine learning.In this paper,we design a distributedmachine learning training application based on the AWS Lambda platform.Based on data parallelism,the data aggregation and training synchronization in Function as a Service(FaaS)are effectively realized.It also encrypts the data set,effectively reducing the risk of data leakage.We rent a cloud server and a Lambda,and then we conduct experiments to evaluate our applications.Our results indicate the effectiveness,rapidity,and economy of distributed training on FaaS. 展开更多
关键词 Serverless computing object detection BLASTING
下载PDF
SMSTracker:A Self-Calibration Multi-Head Self-Attention Transformer for Visual Object Tracking
19
作者 Zhongyang Wang Hu Zhu Feng Liu 《Computers, Materials & Continua》 SCIE EI 2024年第7期605-623,共19页
Visual object tracking plays a crucial role in computer vision.In recent years,researchers have proposed various methods to achieve high-performance object tracking.Among these,methods based on Transformers have becom... Visual object tracking plays a crucial role in computer vision.In recent years,researchers have proposed various methods to achieve high-performance object tracking.Among these,methods based on Transformers have become a research hotspot due to their ability to globally model and contextualize information.However,current Transformer-based object tracking methods still face challenges such as low tracking accuracy and the presence of redundant feature information.In this paper,we introduce self-calibration multi-head self-attention Transformer(SMSTracker)as a solution to these challenges.It employs a hybrid tensor decomposition self-organizing multihead self-attention transformermechanism,which not only compresses and accelerates Transformer operations but also significantly reduces redundant data,thereby enhancing the accuracy and efficiency of tracking.Additionally,we introduce a self-calibration attention fusion block to resolve common issues of attention ambiguities and inconsistencies found in traditional trackingmethods,ensuring the stability and reliability of tracking performance across various scenarios.By integrating a hybrid tensor decomposition approach with a self-organizingmulti-head self-attentive transformer mechanism,SMSTracker enhances the efficiency and accuracy of the tracking process.Experimental results show that SMSTracker achieves competitive performance in visual object tracking,promising more robust and efficient tracking systems,demonstrating its potential to providemore robust and efficient tracking solutions in real-world applications. 展开更多
关键词 Visual object tracking tensor decomposition TRANSFORMER self-attention
下载PDF
Learning Discriminatory Information for Object Detection on Urine Sediment Image
20
作者 Sixian Chan Binghui Wu +2 位作者 Guodao Zhang Yuan Yao Hongqiang Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第1期411-428,共18页
In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,... In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,diagnosis and evaluation of kidney and urinary tract disease,providing insight into the specific type and severity.However,manual urine sediment examination is labor-intensive,time-consuming,and subjective.Traditional machine learning based object detection methods require hand-crafted features for localization and classification,which have poor generalization capabilities and are difficult to quickly and accurately detect the number of urine sediments.Deep learning based object detection methods have the potential to address the challenges mentioned above,but these methods require access to large urine sediment image datasets.Unfortunately,only a limited number of publicly available urine sediment datasets are currently available.To alleviate the lack of urine sediment datasets in medical image analysis,we propose a new dataset named UriSed2K,which contains 2465 high-quality images annotated with expert guidance.Two main challenges are associated with our dataset:a large number of small objects and the occlusion between these small objects.Our manuscript focuses on applying deep learning object detection methods to the urine sediment dataset and addressing the challenges presented by this dataset.Specifically,our goal is to improve the accuracy and efficiency of the detection algorithm and,in doing so,provide medical professionals with an automatic detector that saves time and effort.We propose an improved lightweight one-stage object detection algorithm called Discriminatory-YOLO.The proposed algorithm comprises a local context attention module and a global background suppression module,which aid the detector in distinguishing urine sediment features in the image.The local context attention module captures context information beyond the object region,while the global background suppression module emphasizes objects in uninformative backgrounds.We comprehensively evaluate our method on the UriSed2K dataset,which includes seven categories of urine sediments,such as erythrocytes(red blood cells),leukocytes(white blood cells),epithelial cells,crystals,mycetes,broken erythrocytes,and broken leukocytes,achieving the best average precision(AP)of 95.3%while taking only 10 ms per image.The source code and dataset are available at https://github.com/binghuiwu98/discriminatoryyolov5. 展开更多
关键词 object detection attention mechanism medical image urine sediment
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部