Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unman...Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.展开更多
Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain les...Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.展开更多
Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variati...Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.展开更多
In recent years,there has been extensive research on object detection methods applied to optical remote sensing images utilizing convolutional neural networks.Despite these efforts,the detection of small objects in re...In recent years,there has been extensive research on object detection methods applied to optical remote sensing images utilizing convolutional neural networks.Despite these efforts,the detection of small objects in remote sensing remains a formidable challenge.The deep network structure will bring about the loss of object features,resulting in the loss of object features and the near elimination of some subtle features associated with small objects in deep layers.Additionally,the features of small objects are susceptible to interference from background features contained within the image,leading to a decline in detection accuracy.Moreover,the sensitivity of small objects to the bounding box perturbation further increases the detection difficulty.In this paper,we introduce a novel approach,Cross-Layer Fusion and Weighted Receptive Field-based YOLO(CAW-YOLO),specifically designed for small object detection in remote sensing.To address feature loss in deep layers,we have devised a cross-layer attention fusion module.Background noise is effectively filtered through the incorporation of Bi-Level Routing Attention(BRA).To enhance the model’s capacity to perceive multi-scale objects,particularly small-scale objects,we introduce a weightedmulti-receptive field atrous spatial pyramid poolingmodule.Furthermore,wemitigate the sensitivity arising from bounding box perturbation by incorporating the joint Normalized Wasserstein Distance(NWD)and Efficient Intersection over Union(EIoU)losses.The efficacy of the proposedmodel in detecting small objects in remote sensing has been validated through experiments conducted on three publicly available datasets.The experimental results unequivocally demonstrate the model’s pronounced advantages in small object detection for remote sensing,surpassing the performance of current mainstream models.展开更多
Insulator defect detection plays a vital role in maintaining the secure operation of power systems.To address the issues of the difficulty of detecting small objects and missing objects due to the small scale,variable...Insulator defect detection plays a vital role in maintaining the secure operation of power systems.To address the issues of the difficulty of detecting small objects and missing objects due to the small scale,variable scale,and fuzzy edge morphology of insulator defects,we construct an insulator dataset with 1600 samples containing flashovers and breakages.Then a simple and effective surface defect detection method of power line insulators for difficult small objects is proposed.Firstly,a high-resolution featuremap is introduced and a small object prediction layer is added so that the model can detect tiny objects.Secondly,a simplified adaptive spatial feature fusion(SASFF)module is introduced to perform cross-scale spatial fusion to improve adaptability to variable multi-scale features.Finally,we propose an enhanced deformable attention mechanism(EDAM)module.By integrating a gating activation function,the model is further inspired to learn a small number of critical sampling points near reference points.And the module can improve the perception of object morphology.The experimental results indicate that concerning the dataset of flashover and breakage defects,this method improves the performance of YOLOv5,YOLOv7,and YOLOv8.In practical application,it can simply and effectively improve the precision of power line insulator defect detection and reduce missing detection for difficult small objects.展开更多
Infrared small target detection is a common task in infrared image processing.Under limited computa⁃tional resources.Traditional methods for infrared small target detection face a trade-off between the detection rate ...Infrared small target detection is a common task in infrared image processing.Under limited computa⁃tional resources.Traditional methods for infrared small target detection face a trade-off between the detection rate and the accuracy.A fast infrared small target detection method tailored for resource-constrained conditions is pro⁃posed for the YOLOv5s model.This method introduces an additional small target detection head and replaces the original Intersection over Union(IoU)metric with Normalized Wasserstein Distance(NWD),while considering both the detection accuracy and the detection speed of infrared small targets.Experimental results demonstrate that the proposed algorithm achieves a maximum effective detection speed of 95 FPS on a 15 W TPU,while reach⁃ing a maximum effective detection accuracy of 91.9 AP@0.5,effectively improving the efficiency of infrared small target detection under resource-constrained conditions.展开更多
In the study of oriented bounding boxes(OBB)object detection in high-resolution remote sensing images,the problem of missed and wrong detection of small targets occurs because the targets are too small and have differ...In the study of oriented bounding boxes(OBB)object detection in high-resolution remote sensing images,the problem of missed and wrong detection of small targets occurs because the targets are too small and have different orientations.Existing OBB object detection for remote sensing images,although making good progress,mainly focuses on directional modeling,while less consideration is given to the size of the object as well as the problem of missed detection.In this study,a method based on improved YOLOv8 was proposed for detecting oriented objects in remote sensing images,which can improve the detection precision of oriented objects in remote sensing images.Firstly,the ResCBAMG module was innovatively designed,which could better extract channel and spatial correlation information.Secondly,the innovative top-down feature fusion layer network structure was proposed in conjunction with the Efficient Channel Attention(ECA)attention module,which helped to capture inter-local cross-channel interaction information appropriately.Finally,we introduced an innovative ResCBAMG module between the different C2f modules and detection heads of the bottom-up feature fusion layer.This innovative structure helped the model to better focus on the target area.The precision and robustness of oriented target detection were also improved.Experimental results on the DOTA-v1.5 dataset showed that the detection Precision,mAP@0.5,and mAP@0.5:0.95 metrics of the improved model are better compared to the original model.This improvement is effective in detecting small targets and complex scenes.展开更多
The detection of large-scale objects has achieved high accuracy,but due to the low peak signal to noise ratio(PSNR),fewer distinguishing features,and ease of being occluded by the surroundings,the detection of small o...The detection of large-scale objects has achieved high accuracy,but due to the low peak signal to noise ratio(PSNR),fewer distinguishing features,and ease of being occluded by the surroundings,the detection of small objects,however,does not enjoy similar success.Endeavor to solve the problem,this paper proposes an attention mechanism based on cross-Key values.Based on the traditional transformer,this paper first improves the feature processing with the convolution module,effectively maintaining the local semantic context in the middle layer,and significantly reducing the number of parameters of the model.Then,to enhance the effectiveness of the attention mask,two Key values are calculated simultaneously along Query and Value by using the method of dual-branch parallel processing,which is used to strengthen the attention acquisition mode and improve the coupling of key information.Finally,focusing on the feature maps of different channels,the multi-head attention mechanism is applied to the channel attention mask to improve the feature utilization effect of the middle layer.By comparing three small object datasets,the plug-and-play interactive transformer(IT-transformer)module designed by us effectively improves the detection results of the baseline.展开更多
Recently,object detection based on convolutional neural networks(CNNs)has developed rapidly.The backbone networks for basic feature extraction are an important component of the whole detection task.Therefore,we presen...Recently,object detection based on convolutional neural networks(CNNs)has developed rapidly.The backbone networks for basic feature extraction are an important component of the whole detection task.Therefore,we present a new feature extraction strategy in this paper,which name is DSAFF-Net.In this strategy,we design:1)a sandwich attention feature fusion module(SAFF module).Its purpose is to enhance the semantic information of shallow features and resolution of deep features,which is beneficial to small object detection after feature fusion.2)to add a new stage called D-block to alleviate the disadvantages of decreasing spatial resolution when the pooling layer increases the receptive field.The method proposed in the new stage replaces the original method of obtaining the P6 feature map and uses the result as the input of the regional proposal network(RPN).In the experimental phase,we use the new strategy to extract features.The experiment takes the public dataset of Microsoft Common Objects in Context(MS COCO)object detection and the dataset of Corona Virus Disease 2019(COVID-19)image classification as the experimental object respectively.The results show that the average recognition accuracy of COVID-19 in the classification dataset is improved to 98.163%,and small object detection in object detection tasks is improved by 4.0%.展开更多
In the past several years,remarkable achievements have been made in the field of object detection.Although performance is generally improving,the accuracy of small object detection remains low compared with that of la...In the past several years,remarkable achievements have been made in the field of object detection.Although performance is generally improving,the accuracy of small object detection remains low compared with that of large object detection.In addition,localization misalignment issues are common for small objects,as seen in GoogLeNets and residual networks(ResNets).To address this problem,we propose an improved region-based fully convolutional network(R-FCN).The presented technique improves detection accuracy and eliminates localization misalignment by replacing positionsensitive region of interest(PS-RoI)pooling with position-sensitive precise region of interest(PS-Pr-RoI)pooling,which avoids coordinate quantization and directly calculates two-order integrals for position-sensitive score maps,thus preventing a loss of spatial precision.A validation experiment was conducted in which the Microsoft common objects in context(MS COCO)training dataset was oversampled.Results showed an accuracy improvement of 3.7%for object detection tasks and an increase of 6.0%for small objects.展开更多
In order to solve the problem of small objects detection in unmanned aerial vehicle(UAV)aerial images with complex background,a general detection method for multi-scale small objects based on Faster region-based convo...In order to solve the problem of small objects detection in unmanned aerial vehicle(UAV)aerial images with complex background,a general detection method for multi-scale small objects based on Faster region-based convolutional neural network(Faster R-CNN)is proposed.The bird’s nest on the high-voltage tower is taken as the research object.Firstly,we use the improved convolutional neural network ResNet101 to extract object features,and then use multi-scale sliding windows to obtain the object region proposals on the convolution feature maps with different resolutions.Finally,a deconvolution operation is added to further enhance the selected feature map with higher resolution,and then it taken as a feature mapping layer of the region proposals passing to the object detection sub-network.The detection results of the bird’s nest in UAV aerial images show that the proposed method can precisely detect small objects in aerial images.展开更多
Object detection has been studied for many years.The convolutional neural network has made great progress in the accuracy and speed of object detection.However,due to the low resolution of small objects and the repres...Object detection has been studied for many years.The convolutional neural network has made great progress in the accuracy and speed of object detection.However,due to the low resolution of small objects and the representation of fuzzy features,one of the challenges now is how to effectively detect small objects in images.Existing target detectors for small objects:one is to use high-resolution images as input,the other is to increase the depth of the CNN network,but these two methods will undoubtedly increase the cost of calculation and time-consuming.In this paper,based on the RefineDet network framework,we propose our network structure RF2Det by introducing Receptive Field Block to solve the problem of small object detection,so as to achieve the balance of speed and accuracy.At the same time,we propose a Medium-level Feature Pyramid Networks,which combines appropriate high-level context features with low-level features,so that the network can use the features of both the low-level and the high-level for multi-scale target detection,and the accuracy of the small target detection task based on the low-level features is improved.Extensive experiments on the MS COCO dataset demonstrate that compared to other most advanced methods,our proposed method shows significant performance improvement in the detection of small objects.展开更多
In pursuit of cost-effective manufacturing,enterprises are increasingly adopting the practice of utilizing recycled semiconductor chips.To ensure consistent chip orientation during packaging,a circular marker on the f...In pursuit of cost-effective manufacturing,enterprises are increasingly adopting the practice of utilizing recycled semiconductor chips.To ensure consistent chip orientation during packaging,a circular marker on the front side is employed for pin alignment following successful functional testing.However,recycled chips often exhibit substantial surface wear,and the identification of the relatively small marker proves challenging.Moreover,the complexity of generic target detection algorithms hampers seamless deployment.Addressing these issues,this paper introduces a lightweight YOLOv8s-based network tailored for detecting markings on recycled chips,termed Van-YOLOv8.Initially,to alleviate the influence of diminutive,low-resolution markings on the precision of deep learning models,we utilize an upscaling approach for enhanced resolution.This technique relies on the Super-Resolution Generative Adversarial Network with Extended Training(SRGANext)network,facilitating the reconstruction of high-fidelity images that align with input specifications.Subsequently,we replace the original YOLOv8smodel’s backbone feature extraction network with the lightweight VanillaNetwork(VanillaNet),simplifying the branch structure to reduce network parameters.Finally,a Hybrid Attention Mechanism(HAM)is implemented to capture essential details from input images,improving feature representation while concurrently expediting model inference speed.Experimental results demonstrate that the Van-YOLOv8 network outperforms the original YOLOv8s on a recycled chip dataset in various aspects.Significantly,it demonstrates superiority in parameter count,computational intricacy,precision in identifying targets,and speed when compared to certain prevalent algorithms in the current landscape.The proposed approach proves promising for real-time detection of recycled chips in practical factory settings.展开更多
Printed Circuit Board(PCB)surface tiny defect detection is a difficult task in the integrated circuit industry,especially since the detection of tiny defects on PCB boards with large-size complex circuits has become o...Printed Circuit Board(PCB)surface tiny defect detection is a difficult task in the integrated circuit industry,especially since the detection of tiny defects on PCB boards with large-size complex circuits has become one of the bottlenecks.To improve the performance of PCB surface tiny defects detection,a PCB tiny defects detection model based on an improved attention residual network(YOLOX-AttResNet)is proposed.First,the unsupervised clustering performance of the K-means algorithm is exploited to optimize the channel weights for subsequent operations by feeding the feature mapping into the SENet(Squeeze and Excitation Network)attention network;then the improved K-means-SENet network is fused with the directly mapped edges of the traditional ResNet network to form an augmented residual network(AttResNet);and finally,the AttResNet module is substituted for the traditional ResNet structure in the backbone feature extraction network of mainstream excellent detection models,thus improving the ability to extract small features from the backbone of the target detection network.The results of ablation experiments on a PCB surface defect dataset show that AttResNet is a reliable and efficient module.In Torify the performance of AttResNet for detecting small defects in large-size complex circuit images,a series of comparison experiments are further performed.The results show that the AttResNet module combines well with the five best existing target detection frameworks(YOLOv3,YOLOX,Faster R-CNN,TDD-Net,Cascade R-CNN),and all the combined new models have improved detection accuracy compared to the original model,which suggests that the AttResNet module proposed in this paper can help the detection model to extract target features.Among them,the YOLOX-AttResNet model proposed in this paper performs the best,with the highest accuracy of 98.45% and the detection speed of 36 FPS(Frames Per Second),which meets the accuracy and real-time requirements for the detection of tiny defects on PCB surfaces.This study can provide some new ideas for other real-time online detection tasks of tiny targets with high-resolution images.展开更多
It is known that detecting small moving objects in as- tronomical image sequences is a significant research problem in space surveillance. The new theory, compressive sensing, pro- vides a very easy and computationall...It is known that detecting small moving objects in as- tronomical image sequences is a significant research problem in space surveillance. The new theory, compressive sensing, pro- vides a very easy and computationally cheap coding scheme for onboard astronomical remote sensing. An algorithm for small moving space object detection and localization is proposed. The algorithm determines the measurements of objects by comparing the difference between the measurements of the current image and the measurements of the background scene. In contrast to reconstruct the whole image, only a foreground image is recon- structed, which will lead to an effective computational performance, and a high level of localization accuracy is achieved. Experiments and analysis are provided to show the performance of the pro- posed approach on detection and localization.展开更多
Knowledge distillation is often used for model compression and has achieved a great breakthrough in image classification,but there still remains scope for improvement in object detection,especially for knowledge extra...Knowledge distillation is often used for model compression and has achieved a great breakthrough in image classification,but there still remains scope for improvement in object detection,especially for knowledge extraction of small objects.The main problem is the features of small objects are often polluted by background noise and not prominent due to down-sampling of convolutional neural network(CNN),resulting in the insufficient refinement of small object features during distillation.In this paper,we propose Hierarchical Matching Knowledge Distillation Network(HMKD)that operates on the pyramid level P2 to pyramid level P4 of the feature pyramid network(FPN),aiming to intervene on small object features before affecting.We employ an encoder-decoder network to encapsulate low-resolution,highly semantic information,akin to eliciting insights from profound strata within a teacher network,and then match the encapsulated information with high-resolution feature values of small objects from shallow layers as the key.During this period,we use an attention mechanism to measure the relevance of the inquiry to the feature values.Also in the process of decoding,knowledge is distilled to the student.In addition,we introduce a supplementary distillation module to mitigate the effects of background noise.Experiments show that our method achieves excellent improvements for both one-stage and twostage object detectors.Specifically,applying the proposed method on Faster R-CNN achieves 41.7%mAP on COCO2017(ResNet50 as the backbone),which is 3.8%higher than that of the baseline.展开更多
Aiming at solving the problem of missed detection and low accuracy in detecting traffic signs in the wild, an improved method of YOLOv8 is proposed. Firstly, combined with the characteristics of small target objects i...Aiming at solving the problem of missed detection and low accuracy in detecting traffic signs in the wild, an improved method of YOLOv8 is proposed. Firstly, combined with the characteristics of small target objects in the actual scene, this paper further adds blur and noise operation. Then, the asymptotic feature pyramid network (AFPN) is introduced to highlight the influence of key layer features after feature fusion, and simultaneously solve the direct interaction of non-adjacent layers. Experimental results on the TT100K dataset show that compared with the YOLOv8, the detection accuracy and recall are higher. .展开更多
Unexploded ordnance(UXO)poses a threat to soldiers operating in mission areas,but current UXO detection systems do not necessarily provide the required safety and efficiency to protect soldiers from this hazard.Recent...Unexploded ordnance(UXO)poses a threat to soldiers operating in mission areas,but current UXO detection systems do not necessarily provide the required safety and efficiency to protect soldiers from this hazard.Recent technological advancements in artificial intelligence(AI)and small unmanned aerial systems(sUAS)present an opportunity to explore a novel concept for UXO detection.The new UXO detection system proposed in this study takes advantage of employing an AI-trained multi-spectral(MS)sensor on sUAS.This paper explores feasibility of AI-based UXO detection using sUAS equipped with a single(visible)spectrum(SS)or MS digital electro-optical(EO)sensor.Specifically,it describes the design of the Deep Learning Convolutional Neural Network for UXO detection,the development of an AI-based algorithm for reliable UXO detection,and also provides a comparison of performance of the proposed system based on SS and MS sensor imagery.展开更多
A literature analysis has shown that object search,recognition,and tracking systems are becoming increasingly popular.However,such systems do not achieve high practical results in analyzing small moving living objects...A literature analysis has shown that object search,recognition,and tracking systems are becoming increasingly popular.However,such systems do not achieve high practical results in analyzing small moving living objects ranging from 8 to 14 mm.This article examines methods and tools for recognizing and tracking the class of small moving objects,such as ants.To fulfill those aims,a customized You Only Look Once Ants Recognition(YOLO_AR)Convolutional Neural Network(CNN)has been trained to recognize Messor Structor ants in the laboratory using the LabelImg object marker tool.The proposed model is an extension of the You Only Look Once v4(Yolov4)512×512 model with an additional Self Regularized Non–Monotonic(Mish)activation function.Additionally,the scalable solution for continuous object recognizing and tracking was implemented.This solution is based on the OpenDatacam system,with extended Object Tracking modules that allow for tracking and counting objects that have crossed the custom boundary line.During the study,the methods of the alignment algorithm for finding the trajectory of moving objects were modified.I discovered that the Hungarian algorithm showed better results in tracking small objects than the K–D dimensional tree(k-d tree)matching algorithm used in OpenDataCam.Remarkably,such an algorithm showed better results with the implemented YOLO_AR model due to the lack of False Positives(FP).Therefore,I provided a new tracker module with a Hungarian matching algorithm verified on the Multiple Object Tracking(MOT)benchmark.Furthermore,additional customization parameters for object recognition and tracking results parsing and filtering were added,like boundary angle threshold(BAT)and past frames trajectory prediction(PFTP).Experimental tests confirmed the results of the study on a mobile device.During the experiment,parameters such as the quality of recognition and tracking of moving objects,the PFTP and BAT,and the configuration parameters of the neural network and boundary line model were analyzed.The results showed an increased tracking accuracy with the proposed methods by 50%.The study results confirmed the relevance of the topic and the effectiveness of the implemented methods and tools.展开更多
Military object detection and identification is a key capability in surveillance and reconnaissance.It is a major factor in warfare effectiveness and warfighter survivability.Inexpensive,portable,and rapidly deployabl...Military object detection and identification is a key capability in surveillance and reconnaissance.It is a major factor in warfare effectiveness and warfighter survivability.Inexpensive,portable,and rapidly deployable small unmanned aerial systems(s UAS)in conjunction with powerful deep learning(DL)based object detection models are expected to play an important role for this application.To prove overall feasibility of this approach,this paper discusses some aspects of designing and testing of an automated detection system to locate and identify small firearms left at the training range or at the battlefield.Such a system is envisioned to involve an s UAS equipped with a modern electro-optical(EO)sensor and relying on a trained convolutional neural network(CNN).Previous study by the authors devoted to finding projectiles on the ground revealed certain challenges such as small object size,changes in aspect ratio and image scale,motion blur,occlusion,and camouflage.This study attempts to deal with these challenges in a realistic operational scenario and go further by not only detecting different types of firearms but also classifying them into different categories.This study used a YOLOv2CNN(Res Net-50 backbone network)to train the model with ground truth data and demonstrated a high mean average precision(m AP)of 0.97 to detect and identify not only small pistols but also partially occluded rifles.展开更多
基金This research was funded by the Natural Science Foundation of Hebei Province(F2021506004).
文摘Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.
文摘Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.
基金the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+2 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211).
文摘Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.
基金supported in part by the National Natural Science Foundation of China under Grant 62006071part by the Science and Technology Research Project of Henan Province under Grant 232103810086.
文摘In recent years,there has been extensive research on object detection methods applied to optical remote sensing images utilizing convolutional neural networks.Despite these efforts,the detection of small objects in remote sensing remains a formidable challenge.The deep network structure will bring about the loss of object features,resulting in the loss of object features and the near elimination of some subtle features associated with small objects in deep layers.Additionally,the features of small objects are susceptible to interference from background features contained within the image,leading to a decline in detection accuracy.Moreover,the sensitivity of small objects to the bounding box perturbation further increases the detection difficulty.In this paper,we introduce a novel approach,Cross-Layer Fusion and Weighted Receptive Field-based YOLO(CAW-YOLO),specifically designed for small object detection in remote sensing.To address feature loss in deep layers,we have devised a cross-layer attention fusion module.Background noise is effectively filtered through the incorporation of Bi-Level Routing Attention(BRA).To enhance the model’s capacity to perceive multi-scale objects,particularly small-scale objects,we introduce a weightedmulti-receptive field atrous spatial pyramid poolingmodule.Furthermore,wemitigate the sensitivity arising from bounding box perturbation by incorporating the joint Normalized Wasserstein Distance(NWD)and Efficient Intersection over Union(EIoU)losses.The efficacy of the proposedmodel in detecting small objects in remote sensing has been validated through experiments conducted on three publicly available datasets.The experimental results unequivocally demonstrate the model’s pronounced advantages in small object detection for remote sensing,surpassing the performance of current mainstream models.
基金State Grid Jiangsu Electric Power Co.,Ltd.of the Science and Technology Project(Grant No.J2022004).
文摘Insulator defect detection plays a vital role in maintaining the secure operation of power systems.To address the issues of the difficulty of detecting small objects and missing objects due to the small scale,variable scale,and fuzzy edge morphology of insulator defects,we construct an insulator dataset with 1600 samples containing flashovers and breakages.Then a simple and effective surface defect detection method of power line insulators for difficult small objects is proposed.Firstly,a high-resolution featuremap is introduced and a small object prediction layer is added so that the model can detect tiny objects.Secondly,a simplified adaptive spatial feature fusion(SASFF)module is introduced to perform cross-scale spatial fusion to improve adaptability to variable multi-scale features.Finally,we propose an enhanced deformable attention mechanism(EDAM)module.By integrating a gating activation function,the model is further inspired to learn a small number of critical sampling points near reference points.And the module can improve the perception of object morphology.The experimental results indicate that concerning the dataset of flashover and breakage defects,this method improves the performance of YOLOv5,YOLOv7,and YOLOv8.In practical application,it can simply and effectively improve the precision of power line insulator defect detection and reduce missing detection for difficult small objects.
文摘Infrared small target detection is a common task in infrared image processing.Under limited computa⁃tional resources.Traditional methods for infrared small target detection face a trade-off between the detection rate and the accuracy.A fast infrared small target detection method tailored for resource-constrained conditions is pro⁃posed for the YOLOv5s model.This method introduces an additional small target detection head and replaces the original Intersection over Union(IoU)metric with Normalized Wasserstein Distance(NWD),while considering both the detection accuracy and the detection speed of infrared small targets.Experimental results demonstrate that the proposed algorithm achieves a maximum effective detection speed of 95 FPS on a 15 W TPU,while reach⁃ing a maximum effective detection accuracy of 91.9 AP@0.5,effectively improving the efficiency of infrared small target detection under resource-constrained conditions.
文摘In the study of oriented bounding boxes(OBB)object detection in high-resolution remote sensing images,the problem of missed and wrong detection of small targets occurs because the targets are too small and have different orientations.Existing OBB object detection for remote sensing images,although making good progress,mainly focuses on directional modeling,while less consideration is given to the size of the object as well as the problem of missed detection.In this study,a method based on improved YOLOv8 was proposed for detecting oriented objects in remote sensing images,which can improve the detection precision of oriented objects in remote sensing images.Firstly,the ResCBAMG module was innovatively designed,which could better extract channel and spatial correlation information.Secondly,the innovative top-down feature fusion layer network structure was proposed in conjunction with the Efficient Channel Attention(ECA)attention module,which helped to capture inter-local cross-channel interaction information appropriately.Finally,we introduced an innovative ResCBAMG module between the different C2f modules and detection heads of the bottom-up feature fusion layer.This innovative structure helped the model to better focus on the target area.The precision and robustness of oriented target detection were also improved.Experimental results on the DOTA-v1.5 dataset showed that the detection Precision,mAP@0.5,and mAP@0.5:0.95 metrics of the improved model are better compared to the original model.This improvement is effective in detecting small targets and complex scenes.
文摘The detection of large-scale objects has achieved high accuracy,but due to the low peak signal to noise ratio(PSNR),fewer distinguishing features,and ease of being occluded by the surroundings,the detection of small objects,however,does not enjoy similar success.Endeavor to solve the problem,this paper proposes an attention mechanism based on cross-Key values.Based on the traditional transformer,this paper first improves the feature processing with the convolution module,effectively maintaining the local semantic context in the middle layer,and significantly reducing the number of parameters of the model.Then,to enhance the effectiveness of the attention mask,two Key values are calculated simultaneously along Query and Value by using the method of dual-branch parallel processing,which is used to strengthen the attention acquisition mode and improve the coupling of key information.Finally,focusing on the feature maps of different channels,the multi-head attention mechanism is applied to the channel attention mask to improve the feature utilization effect of the middle layer.By comparing three small object datasets,the plug-and-play interactive transformer(IT-transformer)module designed by us effectively improves the detection results of the baseline.
基金the National Natural Science Foundation of China under grant 62172059 and 62072055Hunan Provincial Natural Science Foundations of China under Grant 2020JJ4626+2 种基金Scientific Research Fund of Hunan Provincial Education Department of China under Grant 19B004“Double First-class”International Cooperation and Development Scientific Research Project of Changsha University of Science and Technology under Grant 2018IC25the Young Teacher Growth Plan Project of Changsha University of Science and Technology under Grant 2019QJCZ076.
文摘Recently,object detection based on convolutional neural networks(CNNs)has developed rapidly.The backbone networks for basic feature extraction are an important component of the whole detection task.Therefore,we present a new feature extraction strategy in this paper,which name is DSAFF-Net.In this strategy,we design:1)a sandwich attention feature fusion module(SAFF module).Its purpose is to enhance the semantic information of shallow features and resolution of deep features,which is beneficial to small object detection after feature fusion.2)to add a new stage called D-block to alleviate the disadvantages of decreasing spatial resolution when the pooling layer increases the receptive field.The method proposed in the new stage replaces the original method of obtaining the P6 feature map and uses the result as the input of the regional proposal network(RPN).In the experimental phase,we use the new strategy to extract features.The experiment takes the public dataset of Microsoft Common Objects in Context(MS COCO)object detection and the dataset of Corona Virus Disease 2019(COVID-19)image classification as the experimental object respectively.The results show that the average recognition accuracy of COVID-19 in the classification dataset is improved to 98.163%,and small object detection in object detection tasks is improved by 4.0%.
基金This project was supported by the National Natural Science Foundation of China under grant U1836208the Hunan Provincial Natural Science Foundations of China under Grant 2020JJ4626+2 种基金the Scientific Research Fund of Hunan Provincial Education Department of China under Grant 19B004the“Double First-class”International Cooperation and Development Scientific Research Project of Changsha University of Science and Technology under Grant 2018IC25the Young Teacher Growth Plan Project of Changsha University of Science and Technology under Grant 2019QJCZ076.
文摘In the past several years,remarkable achievements have been made in the field of object detection.Although performance is generally improving,the accuracy of small object detection remains low compared with that of large object detection.In addition,localization misalignment issues are common for small objects,as seen in GoogLeNets and residual networks(ResNets).To address this problem,we propose an improved region-based fully convolutional network(R-FCN).The presented technique improves detection accuracy and eliminates localization misalignment by replacing positionsensitive region of interest(PS-RoI)pooling with position-sensitive precise region of interest(PS-Pr-RoI)pooling,which avoids coordinate quantization and directly calculates two-order integrals for position-sensitive score maps,thus preventing a loss of spatial precision.A validation experiment was conducted in which the Microsoft common objects in context(MS COCO)training dataset was oversampled.Results showed an accuracy improvement of 3.7%for object detection tasks and an increase of 6.0%for small objects.
基金National Defense Pre-research Fund Project(No.KMGY318002531)。
文摘In order to solve the problem of small objects detection in unmanned aerial vehicle(UAV)aerial images with complex background,a general detection method for multi-scale small objects based on Faster region-based convolutional neural network(Faster R-CNN)is proposed.The bird’s nest on the high-voltage tower is taken as the research object.Firstly,we use the improved convolutional neural network ResNet101 to extract object features,and then use multi-scale sliding windows to obtain the object region proposals on the convolution feature maps with different resolutions.Finally,a deconvolution operation is added to further enhance the selected feature map with higher resolution,and then it taken as a feature mapping layer of the region proposals passing to the object detection sub-network.The detection results of the bird’s nest in UAV aerial images show that the proposed method can precisely detect small objects in aerial images.
文摘Object detection has been studied for many years.The convolutional neural network has made great progress in the accuracy and speed of object detection.However,due to the low resolution of small objects and the representation of fuzzy features,one of the challenges now is how to effectively detect small objects in images.Existing target detectors for small objects:one is to use high-resolution images as input,the other is to increase the depth of the CNN network,but these two methods will undoubtedly increase the cost of calculation and time-consuming.In this paper,based on the RefineDet network framework,we propose our network structure RF2Det by introducing Receptive Field Block to solve the problem of small object detection,so as to achieve the balance of speed and accuracy.At the same time,we propose a Medium-level Feature Pyramid Networks,which combines appropriate high-level context features with low-level features,so that the network can use the features of both the low-level and the high-level for multi-scale target detection,and the accuracy of the small target detection task based on the low-level features is improved.Extensive experiments on the MS COCO dataset demonstrate that compared to other most advanced methods,our proposed method shows significant performance improvement in the detection of small objects.
基金the Liaoning Provincial Department of Education 2021 Annual Scientific Research Funding Program(Grant Numbers LJKZ0535,LJKZ0526)the 2021 Annual Comprehensive Reform of Undergraduate Education Teaching(Grant Numbers JGLX2021020,JCLX2021008)Graduate Innovation Fund of Dalian Polytechnic University(Grant Number 2023CXYJ13).
文摘In pursuit of cost-effective manufacturing,enterprises are increasingly adopting the practice of utilizing recycled semiconductor chips.To ensure consistent chip orientation during packaging,a circular marker on the front side is employed for pin alignment following successful functional testing.However,recycled chips often exhibit substantial surface wear,and the identification of the relatively small marker proves challenging.Moreover,the complexity of generic target detection algorithms hampers seamless deployment.Addressing these issues,this paper introduces a lightweight YOLOv8s-based network tailored for detecting markings on recycled chips,termed Van-YOLOv8.Initially,to alleviate the influence of diminutive,low-resolution markings on the precision of deep learning models,we utilize an upscaling approach for enhanced resolution.This technique relies on the Super-Resolution Generative Adversarial Network with Extended Training(SRGANext)network,facilitating the reconstruction of high-fidelity images that align with input specifications.Subsequently,we replace the original YOLOv8smodel’s backbone feature extraction network with the lightweight VanillaNetwork(VanillaNet),simplifying the branch structure to reduce network parameters.Finally,a Hybrid Attention Mechanism(HAM)is implemented to capture essential details from input images,improving feature representation while concurrently expediting model inference speed.Experimental results demonstrate that the Van-YOLOv8 network outperforms the original YOLOv8s on a recycled chip dataset in various aspects.Significantly,it demonstrates superiority in parameter count,computational intricacy,precision in identifying targets,and speed when compared to certain prevalent algorithms in the current landscape.The proposed approach proves promising for real-time detection of recycled chips in practical factory settings.
基金supported by the National Natural Science Foundation of China(No.61976083)Hubei Province Key R&D Program of China(No.2022BBA0016).
文摘Printed Circuit Board(PCB)surface tiny defect detection is a difficult task in the integrated circuit industry,especially since the detection of tiny defects on PCB boards with large-size complex circuits has become one of the bottlenecks.To improve the performance of PCB surface tiny defects detection,a PCB tiny defects detection model based on an improved attention residual network(YOLOX-AttResNet)is proposed.First,the unsupervised clustering performance of the K-means algorithm is exploited to optimize the channel weights for subsequent operations by feeding the feature mapping into the SENet(Squeeze and Excitation Network)attention network;then the improved K-means-SENet network is fused with the directly mapped edges of the traditional ResNet network to form an augmented residual network(AttResNet);and finally,the AttResNet module is substituted for the traditional ResNet structure in the backbone feature extraction network of mainstream excellent detection models,thus improving the ability to extract small features from the backbone of the target detection network.The results of ablation experiments on a PCB surface defect dataset show that AttResNet is a reliable and efficient module.In Torify the performance of AttResNet for detecting small defects in large-size complex circuit images,a series of comparison experiments are further performed.The results show that the AttResNet module combines well with the five best existing target detection frameworks(YOLOv3,YOLOX,Faster R-CNN,TDD-Net,Cascade R-CNN),and all the combined new models have improved detection accuracy compared to the original model,which suggests that the AttResNet module proposed in this paper can help the detection model to extract target features.Among them,the YOLOX-AttResNet model proposed in this paper performs the best,with the highest accuracy of 98.45% and the detection speed of 36 FPS(Frames Per Second),which meets the accuracy and real-time requirements for the detection of tiny defects on PCB surfaces.This study can provide some new ideas for other real-time online detection tasks of tiny targets with high-resolution images.
基金supported by the National Natural Science Foundation of China (60903126)the China Postdoctoral Special Science Foundation (201003685)+1 种基金the China Postdoctoral Science Foundation (20090451397)the Northwestern Polytechnical University Foundation for Fundamental Research (JC201120)
文摘It is known that detecting small moving objects in as- tronomical image sequences is a significant research problem in space surveillance. The new theory, compressive sensing, pro- vides a very easy and computationally cheap coding scheme for onboard astronomical remote sensing. An algorithm for small moving space object detection and localization is proposed. The algorithm determines the measurements of objects by comparing the difference between the measurements of the current image and the measurements of the background scene. In contrast to reconstruct the whole image, only a foreground image is recon- structed, which will lead to an effective computational performance, and a high level of localization accuracy is achieved. Experiments and analysis are provided to show the performance of the pro- posed approach on detection and localization.
基金supported in part by the Joint Fund of the Ministry of Education for Equipment Pre-Research of China under Grant No.8091B032257the National Natural Science Foundation of China under Grant Nos.62106232 and 62372415+1 种基金the China Postdoctoral Science Foundation under Grant No.2021TQ0301the Outstanding Youth Science Fund of Henan Province of China under Grant No.242300421050.
文摘Knowledge distillation is often used for model compression and has achieved a great breakthrough in image classification,but there still remains scope for improvement in object detection,especially for knowledge extraction of small objects.The main problem is the features of small objects are often polluted by background noise and not prominent due to down-sampling of convolutional neural network(CNN),resulting in the insufficient refinement of small object features during distillation.In this paper,we propose Hierarchical Matching Knowledge Distillation Network(HMKD)that operates on the pyramid level P2 to pyramid level P4 of the feature pyramid network(FPN),aiming to intervene on small object features before affecting.We employ an encoder-decoder network to encapsulate low-resolution,highly semantic information,akin to eliciting insights from profound strata within a teacher network,and then match the encapsulated information with high-resolution feature values of small objects from shallow layers as the key.During this period,we use an attention mechanism to measure the relevance of the inquiry to the feature values.Also in the process of decoding,knowledge is distilled to the student.In addition,we introduce a supplementary distillation module to mitigate the effects of background noise.Experiments show that our method achieves excellent improvements for both one-stage and twostage object detectors.Specifically,applying the proposed method on Faster R-CNN achieves 41.7%mAP on COCO2017(ResNet50 as the backbone),which is 3.8%higher than that of the baseline.
文摘Aiming at solving the problem of missed detection and low accuracy in detecting traffic signs in the wild, an improved method of YOLOv8 is proposed. Firstly, combined with the characteristics of small target objects in the actual scene, this paper further adds blur and noise operation. Then, the asymptotic feature pyramid network (AFPN) is introduced to highlight the influence of key layer features after feature fusion, and simultaneously solve the direct interaction of non-adjacent layers. Experimental results on the TT100K dataset show that compared with the YOLOv8, the detection accuracy and recall are higher. .
基金the Office of Naval Research for supporting this effort through the Consortium for Robotics and Unmanned Systems Education and Research。
文摘Unexploded ordnance(UXO)poses a threat to soldiers operating in mission areas,but current UXO detection systems do not necessarily provide the required safety and efficiency to protect soldiers from this hazard.Recent technological advancements in artificial intelligence(AI)and small unmanned aerial systems(sUAS)present an opportunity to explore a novel concept for UXO detection.The new UXO detection system proposed in this study takes advantage of employing an AI-trained multi-spectral(MS)sensor on sUAS.This paper explores feasibility of AI-based UXO detection using sUAS equipped with a single(visible)spectrum(SS)or MS digital electro-optical(EO)sensor.Specifically,it describes the design of the Deep Learning Convolutional Neural Network for UXO detection,the development of an AI-based algorithm for reliable UXO detection,and also provides a comparison of performance of the proposed system based on SS and MS sensor imagery.
文摘A literature analysis has shown that object search,recognition,and tracking systems are becoming increasingly popular.However,such systems do not achieve high practical results in analyzing small moving living objects ranging from 8 to 14 mm.This article examines methods and tools for recognizing and tracking the class of small moving objects,such as ants.To fulfill those aims,a customized You Only Look Once Ants Recognition(YOLO_AR)Convolutional Neural Network(CNN)has been trained to recognize Messor Structor ants in the laboratory using the LabelImg object marker tool.The proposed model is an extension of the You Only Look Once v4(Yolov4)512×512 model with an additional Self Regularized Non–Monotonic(Mish)activation function.Additionally,the scalable solution for continuous object recognizing and tracking was implemented.This solution is based on the OpenDatacam system,with extended Object Tracking modules that allow for tracking and counting objects that have crossed the custom boundary line.During the study,the methods of the alignment algorithm for finding the trajectory of moving objects were modified.I discovered that the Hungarian algorithm showed better results in tracking small objects than the K–D dimensional tree(k-d tree)matching algorithm used in OpenDataCam.Remarkably,such an algorithm showed better results with the implemented YOLO_AR model due to the lack of False Positives(FP).Therefore,I provided a new tracker module with a Hungarian matching algorithm verified on the Multiple Object Tracking(MOT)benchmark.Furthermore,additional customization parameters for object recognition and tracking results parsing and filtering were added,like boundary angle threshold(BAT)and past frames trajectory prediction(PFTP).Experimental tests confirmed the results of the study on a mobile device.During the experiment,parameters such as the quality of recognition and tracking of moving objects,the PFTP and BAT,and the configuration parameters of the neural network and boundary line model were analyzed.The results showed an increased tracking accuracy with the proposed methods by 50%.The study results confirmed the relevance of the topic and the effectiveness of the implemented methods and tools.
文摘Military object detection and identification is a key capability in surveillance and reconnaissance.It is a major factor in warfare effectiveness and warfighter survivability.Inexpensive,portable,and rapidly deployable small unmanned aerial systems(s UAS)in conjunction with powerful deep learning(DL)based object detection models are expected to play an important role for this application.To prove overall feasibility of this approach,this paper discusses some aspects of designing and testing of an automated detection system to locate and identify small firearms left at the training range or at the battlefield.Such a system is envisioned to involve an s UAS equipped with a modern electro-optical(EO)sensor and relying on a trained convolutional neural network(CNN).Previous study by the authors devoted to finding projectiles on the ground revealed certain challenges such as small object size,changes in aspect ratio and image scale,motion blur,occlusion,and camouflage.This study attempts to deal with these challenges in a realistic operational scenario and go further by not only detecting different types of firearms but also classifying them into different categories.This study used a YOLOv2CNN(Res Net-50 backbone network)to train the model with ground truth data and demonstrated a high mean average precision(m AP)of 0.97 to detect and identify not only small pistols but also partially occluded rifles.