Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unman...Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.展开更多
In recent years,there has been extensive research on object detection methods applied to optical remote sensing images utilizing convolutional neural networks.Despite these efforts,the detection of small objects in re...In recent years,there has been extensive research on object detection methods applied to optical remote sensing images utilizing convolutional neural networks.Despite these efforts,the detection of small objects in remote sensing remains a formidable challenge.The deep network structure will bring about the loss of object features,resulting in the loss of object features and the near elimination of some subtle features associated with small objects in deep layers.Additionally,the features of small objects are susceptible to interference from background features contained within the image,leading to a decline in detection accuracy.Moreover,the sensitivity of small objects to the bounding box perturbation further increases the detection difficulty.In this paper,we introduce a novel approach,Cross-Layer Fusion and Weighted Receptive Field-based YOLO(CAW-YOLO),specifically designed for small object detection in remote sensing.To address feature loss in deep layers,we have devised a cross-layer attention fusion module.Background noise is effectively filtered through the incorporation of Bi-Level Routing Attention(BRA).To enhance the model’s capacity to perceive multi-scale objects,particularly small-scale objects,we introduce a weightedmulti-receptive field atrous spatial pyramid poolingmodule.Furthermore,wemitigate the sensitivity arising from bounding box perturbation by incorporating the joint Normalized Wasserstein Distance(NWD)and Efficient Intersection over Union(EIoU)losses.The efficacy of the proposedmodel in detecting small objects in remote sensing has been validated through experiments conducted on three publicly available datasets.The experimental results unequivocally demonstrate the model’s pronounced advantages in small object detection for remote sensing,surpassing the performance of current mainstream models.展开更多
Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variati...Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.展开更多
Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain les...Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.展开更多
In the past several years,remarkable achievements have been made in the field of object detection.Although performance is generally improving,the accuracy of small object detection remains low compared with that of la...In the past several years,remarkable achievements have been made in the field of object detection.Although performance is generally improving,the accuracy of small object detection remains low compared with that of large object detection.In addition,localization misalignment issues are common for small objects,as seen in GoogLeNets and residual networks(ResNets).To address this problem,we propose an improved region-based fully convolutional network(R-FCN).The presented technique improves detection accuracy and eliminates localization misalignment by replacing positionsensitive region of interest(PS-RoI)pooling with position-sensitive precise region of interest(PS-Pr-RoI)pooling,which avoids coordinate quantization and directly calculates two-order integrals for position-sensitive score maps,thus preventing a loss of spatial precision.A validation experiment was conducted in which the Microsoft common objects in context(MS COCO)training dataset was oversampled.Results showed an accuracy improvement of 3.7%for object detection tasks and an increase of 6.0%for small objects.展开更多
Recently,object detection based on convolutional neural networks(CNNs)has developed rapidly.The backbone networks for basic feature extraction are an important component of the whole detection task.Therefore,we presen...Recently,object detection based on convolutional neural networks(CNNs)has developed rapidly.The backbone networks for basic feature extraction are an important component of the whole detection task.Therefore,we present a new feature extraction strategy in this paper,which name is DSAFF-Net.In this strategy,we design:1)a sandwich attention feature fusion module(SAFF module).Its purpose is to enhance the semantic information of shallow features and resolution of deep features,which is beneficial to small object detection after feature fusion.2)to add a new stage called D-block to alleviate the disadvantages of decreasing spatial resolution when the pooling layer increases the receptive field.The method proposed in the new stage replaces the original method of obtaining the P6 feature map and uses the result as the input of the regional proposal network(RPN).In the experimental phase,we use the new strategy to extract features.The experiment takes the public dataset of Microsoft Common Objects in Context(MS COCO)object detection and the dataset of Corona Virus Disease 2019(COVID-19)image classification as the experimental object respectively.The results show that the average recognition accuracy of COVID-19 in the classification dataset is improved to 98.163%,and small object detection in object detection tasks is improved by 4.0%.展开更多
The detection of large-scale objects has achieved high accuracy,but due to the low peak signal to noise ratio(PSNR),fewer distinguishing features,and ease of being occluded by the surroundings,the detection of small o...The detection of large-scale objects has achieved high accuracy,but due to the low peak signal to noise ratio(PSNR),fewer distinguishing features,and ease of being occluded by the surroundings,the detection of small objects,however,does not enjoy similar success.Endeavor to solve the problem,this paper proposes an attention mechanism based on cross-Key values.Based on the traditional transformer,this paper first improves the feature processing with the convolution module,effectively maintaining the local semantic context in the middle layer,and significantly reducing the number of parameters of the model.Then,to enhance the effectiveness of the attention mask,two Key values are calculated simultaneously along Query and Value by using the method of dual-branch parallel processing,which is used to strengthen the attention acquisition mode and improve the coupling of key information.Finally,focusing on the feature maps of different channels,the multi-head attention mechanism is applied to the channel attention mask to improve the feature utilization effect of the middle layer.By comparing three small object datasets,the plug-and-play interactive transformer(IT-transformer)module designed by us effectively improves the detection results of the baseline.展开更多
Knowledge distillation is often used for model compression and has achieved a great breakthrough in image classification,but there still remains scope for improvement in object detection,especially for knowledge extra...Knowledge distillation is often used for model compression and has achieved a great breakthrough in image classification,but there still remains scope for improvement in object detection,especially for knowledge extraction of small objects.The main problem is the features of small objects are often polluted by background noise and not prominent due to down-sampling of convolutional neural network(CNN),resulting in the insufficient refinement of small object features during distillation.In this paper,we propose Hierarchical Matching Knowledge Distillation Network(HMKD)that operates on the pyramid level P2 to pyramid level P4 of the feature pyramid network(FPN),aiming to intervene on small object features before affecting.We employ an encoder-decoder network to encapsulate low-resolution,highly semantic information,akin to eliciting insights from profound strata within a teacher network,and then match the encapsulated information with high-resolution feature values of small objects from shallow layers as the key.During this period,we use an attention mechanism to measure the relevance of the inquiry to the feature values.Also in the process of decoding,knowledge is distilled to the student.In addition,we introduce a supplementary distillation module to mitigate the effects of background noise.Experiments show that our method achieves excellent improvements for both one-stage and twostage object detectors.Specifically,applying the proposed method on Faster R-CNN achieves 41.7%mAP on COCO2017(ResNet50 as the backbone),which is 3.8%higher than that of the baseline.展开更多
Infrared small target detection is a common task in infrared image processing.Under limited computa⁃tional resources.Traditional methods for infrared small target detection face a trade-off between the detection rate ...Infrared small target detection is a common task in infrared image processing.Under limited computa⁃tional resources.Traditional methods for infrared small target detection face a trade-off between the detection rate and the accuracy.A fast infrared small target detection method tailored for resource-constrained conditions is pro⁃posed for the YOLOv5s model.This method introduces an additional small target detection head and replaces the original Intersection over Union(IoU)metric with Normalized Wasserstein Distance(NWD),while considering both the detection accuracy and the detection speed of infrared small targets.Experimental results demonstrate that the proposed algorithm achieves a maximum effective detection speed of 95 FPS on a 15 W TPU,while reach⁃ing a maximum effective detection accuracy of 91.9 AP@0.5,effectively improving the efficiency of infrared small target detection under resource-constrained conditions.展开更多
Object detection has been studied for many years.The convolutional neural network has made great progress in the accuracy and speed of object detection.However,due to the low resolution of small objects and the repres...Object detection has been studied for many years.The convolutional neural network has made great progress in the accuracy and speed of object detection.However,due to the low resolution of small objects and the representation of fuzzy features,one of the challenges now is how to effectively detect small objects in images.Existing target detectors for small objects:one is to use high-resolution images as input,the other is to increase the depth of the CNN network,but these two methods will undoubtedly increase the cost of calculation and time-consuming.In this paper,based on the RefineDet network framework,we propose our network structure RF2Det by introducing Receptive Field Block to solve the problem of small object detection,so as to achieve the balance of speed and accuracy.At the same time,we propose a Medium-level Feature Pyramid Networks,which combines appropriate high-level context features with low-level features,so that the network can use the features of both the low-level and the high-level for multi-scale target detection,and the accuracy of the small target detection task based on the low-level features is improved.Extensive experiments on the MS COCO dataset demonstrate that compared to other most advanced methods,our proposed method shows significant performance improvement in the detection of small objects.展开更多
With the advancement of society and science and technology, the demand for detecting small objects in practical scenarios becomes stronger. Such objects are only represented by relatively small coverage of pixels, and...With the advancement of society and science and technology, the demand for detecting small objects in practical scenarios becomes stronger. Such objects are only represented by relatively small coverage of pixels, and the features are degraded severely after being extracted by a deep convolutional neural network, which is detrimental to the detection performance for small objects. Therefore, an intuitive solution is to increase the resolution of small objects by cropping the original image. In this paper, we propose a simple but effective object density map guided region localization module (DMGRL) to locate and crop the regions of interest where small objects may exist. Firstly, the density map of the objects is estimated by object density map estimation network, and then the coordinates of the small object regions are calculated;Secondly, the continuous differentiable affine transformation is utilized to crop these regions so that the detector with DMGRL can be trained end-to-end instead of two-stage training. Finally, the all prediction results of input image and cropped region images are merged together to output the final detection results by non maximum suppression (NMS). Extensive experiments demonstrate the superior performance of the detector incorporated DMGRL.展开更多
In order to solve the problem of small objects detection in unmanned aerial vehicle(UAV)aerial images with complex background,a general detection method for multi-scale small objects based on Faster region-based convo...In order to solve the problem of small objects detection in unmanned aerial vehicle(UAV)aerial images with complex background,a general detection method for multi-scale small objects based on Faster region-based convolutional neural network(Faster R-CNN)is proposed.The bird’s nest on the high-voltage tower is taken as the research object.Firstly,we use the improved convolutional neural network ResNet101 to extract object features,and then use multi-scale sliding windows to obtain the object region proposals on the convolution feature maps with different resolutions.Finally,a deconvolution operation is added to further enhance the selected feature map with higher resolution,and then it taken as a feature mapping layer of the region proposals passing to the object detection sub-network.The detection results of the bird’s nest in UAV aerial images show that the proposed method can precisely detect small objects in aerial images.展开更多
It is known that detecting small moving objects in as- tronomical image sequences is a significant research problem in space surveillance. The new theory, compressive sensing, pro- vides a very easy and computationall...It is known that detecting small moving objects in as- tronomical image sequences is a significant research problem in space surveillance. The new theory, compressive sensing, pro- vides a very easy and computationally cheap coding scheme for onboard astronomical remote sensing. An algorithm for small moving space object detection and localization is proposed. The algorithm determines the measurements of objects by comparing the difference between the measurements of the current image and the measurements of the background scene. In contrast to reconstruct the whole image, only a foreground image is recon- structed, which will lead to an effective computational performance, and a high level of localization accuracy is achieved. Experiments and analysis are provided to show the performance of the pro- posed approach on detection and localization.展开更多
In pursuit of cost-effective manufacturing,enterprises are increasingly adopting the practice of utilizing recycled semiconductor chips.To ensure consistent chip orientation during packaging,a circular marker on the f...In pursuit of cost-effective manufacturing,enterprises are increasingly adopting the practice of utilizing recycled semiconductor chips.To ensure consistent chip orientation during packaging,a circular marker on the front side is employed for pin alignment following successful functional testing.However,recycled chips often exhibit substantial surface wear,and the identification of the relatively small marker proves challenging.Moreover,the complexity of generic target detection algorithms hampers seamless deployment.Addressing these issues,this paper introduces a lightweight YOLOv8s-based network tailored for detecting markings on recycled chips,termed Van-YOLOv8.Initially,to alleviate the influence of diminutive,low-resolution markings on the precision of deep learning models,we utilize an upscaling approach for enhanced resolution.This technique relies on the Super-Resolution Generative Adversarial Network with Extended Training(SRGANext)network,facilitating the reconstruction of high-fidelity images that align with input specifications.Subsequently,we replace the original YOLOv8smodel’s backbone feature extraction network with the lightweight VanillaNetwork(VanillaNet),simplifying the branch structure to reduce network parameters.Finally,a Hybrid Attention Mechanism(HAM)is implemented to capture essential details from input images,improving feature representation while concurrently expediting model inference speed.Experimental results demonstrate that the Van-YOLOv8 network outperforms the original YOLOv8s on a recycled chip dataset in various aspects.Significantly,it demonstrates superiority in parameter count,computational intricacy,precision in identifying targets,and speed when compared to certain prevalent algorithms in the current landscape.The proposed approach proves promising for real-time detection of recycled chips in practical factory settings.展开更多
Aiming at solving the problem of missed detection and low accuracy in detecting traffic signs in the wild, an improved method of YOLOv8 is proposed. Firstly, combined with the characteristics of small target objects i...Aiming at solving the problem of missed detection and low accuracy in detecting traffic signs in the wild, an improved method of YOLOv8 is proposed. Firstly, combined with the characteristics of small target objects in the actual scene, this paper further adds blur and noise operation. Then, the asymptotic feature pyramid network (AFPN) is introduced to highlight the influence of key layer features after feature fusion, and simultaneously solve the direct interaction of non-adjacent layers. Experimental results on the TT100K dataset show that compared with the YOLOv8, the detection accuracy and recall are higher. .展开更多
Due to small size and high occult,metacarpophalangeal fracturediagnosis displays a low accuracy in terms of fracture detection and locationin X-ray images.To efficiently detect metacarpophalangeal fractures on Xrayima...Due to small size and high occult,metacarpophalangeal fracturediagnosis displays a low accuracy in terms of fracture detection and locationin X-ray images.To efficiently detect metacarpophalangeal fractures on Xrayimages as the second opinion for radiologists,we proposed a novel onestageneural network namedMPFracNet based onRetinaNet.InMPFracNet,a deformable bottleneck block(DBB)was integrated into the bottleneckto better adapt to the geometric variation of the fractures.Furthermore,an integrated feature fusion module(IFFM)was employed to obtain morein-depth semantic and shallow detail features.Specifically,Focal Loss andBalanced L1 Loss were introduced to respectively attenuate the imbalancebetween positive and negative classes and the imbalance between detectionand location tasks.We assessed the proposed model on the test set andachieved an AP of 80.4%for the metacarpophalangeal fracture detection.To estimate the detection performance for fractures with different difficulties,the proposed model was tested on the subsets of metacarpal,phalangeal andtiny fracture test sets and achieved APs of 82.7%,78.5%and 74.9%,respectively.Our proposed framework has state-of-the-art performance for detectingmetacarpophalangeal fractures,which has a strong potential application valuein practical clinical environments.展开更多
Surface defects can affect the quality of steel plate.Many methods based on computer vision are currently applied to surface defect detection of steel plate.However,their real-time performance and object detection of ...Surface defects can affect the quality of steel plate.Many methods based on computer vision are currently applied to surface defect detection of steel plate.However,their real-time performance and object detection of small defect are still unsatisfactory.An improved object detection network based on You Only Look One-level Feature(YOLOF)is proposed to show excellent performance in surface defect detection of steel plate,called DLF-YOLOF.First,the anchor-free detector is used to reduce the network hyperparameters.Secondly,deformable convolution network and local spatial attention module are introduced into the feature extraction network to increase the contextual information in the feature maps.Also,the soft non-maximum suppression is used to improve detection accuracy significantly.Finally,data augmentation is performed for small defect objects during training to improve detection accuracy.Experiments show the average precision and average precision for small objects are 42.7%and 33.5%at a detection speed of 62 frames per second on a single GPU,respectively.This shows that DLF-YOLOF has excellent performance to meet the needs of industrial real-time detection.展开更多
To solve problems such as the low detection accuracy of helmet wear-ing,missing detection and poor real-time performance of embedded equipment in the scene of remote and small targets at the construction site,the text...To solve problems such as the low detection accuracy of helmet wear-ing,missing detection and poor real-time performance of embedded equipment in the scene of remote and small targets at the construction site,the text proposes an improved YOLO v5 for small target helmet wearing detection.Based on YOLO v5,the self-attention transformer mechanism and swin transformer module are introduced in the feature fusion step to increase the receptivefield of the con-volution kernel and globally model the high-level semantic feature information extracted from the backbone network to make the model more focused on hel-met feature learning.Replace some convolution operators with lighter and more efficient Involution operators to reduce the number of parameters.The connection mode of the Concat is improved,and 1×1 convolution is added.The experimental results compared with YOLO v5 show that the size of the improved helmet detec-tion model is reduced by 17.8%occupying only 33.2 MB,FPS increased by 5%,and mAP@0.5 reached 94.9%.This approach effectively improves the accuracy of small target helmet wear detection,and meets the deployment requirements for low computational power embedded devices.展开更多
Smoking is the main reason for fire disaster and pollution in petrol station,construction site and warehouse.Existing solutions based on wearable devices and smoking sensors were costly and hard to obtain evidence of ...Smoking is the main reason for fire disaster and pollution in petrol station,construction site and warehouse.Existing solutions based on wearable devices and smoking sensors were costly and hard to obtain evidence of smoking in unmanned scenarios.With the developments of closed circuit television(CCTV)system,vision-based methods for object detection,mostly driven by deep learning techniques,were introduced recently.However,the massive GPU computing hardware required by the deep learning algorithm made these methods hard to be deployed.This paper aims at solving the smoking detection problem on edge and proposes the solution that has fast detection speed,high accuracy on micro-objects and low computing budget,i.e.,it could be deployed on the edge device such as NVIDIA JETSON TX2.We designed a new framework named RTVBS based on yolov3 and made a smoking dataset to train our model.We raised several methods to improve detection accuracy during the training step.The validation results show our model has excellent performance in smoking detection.展开更多
Detecting small objects is a challenging task.We focus on a special case:the detection and classification of traffic signals in street views.We present a novel framework that utilizes a visual attention model to make ...Detecting small objects is a challenging task.We focus on a special case:the detection and classification of traffic signals in street views.We present a novel framework that utilizes a visual attention model to make detection more efficient,without loss of accuracy,and which generalizes.The attention model is designed to generate a small set of candidate regions at a suitable scale so that small targets can be better located and classified.In order to evaluate our method in the context of traffic signal detection,we have built a traffic light benchmark with over 15,000 traffic light instances,based on Tencent street view panoramas.We have tested our method both on the dataset we have built and the Tsinghua–Tencent 100K(TT100K)traffic sign benchmark.Experiments show that our method has superior detection performance and is quicker than the general faster RCNN object detection framework on both datasets.It is competitive with state-of-theart specialist traffic sign detectors on TT100K,but is an order of magnitude faster.To show generality,we tested it on the LISA dataset without tuning,and obtained an average precision in excess of 90%.展开更多
基金This research was funded by the Natural Science Foundation of Hebei Province(F2021506004).
文摘Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.
基金supported in part by the National Natural Science Foundation of China under Grant 62006071part by the Science and Technology Research Project of Henan Province under Grant 232103810086.
文摘In recent years,there has been extensive research on object detection methods applied to optical remote sensing images utilizing convolutional neural networks.Despite these efforts,the detection of small objects in remote sensing remains a formidable challenge.The deep network structure will bring about the loss of object features,resulting in the loss of object features and the near elimination of some subtle features associated with small objects in deep layers.Additionally,the features of small objects are susceptible to interference from background features contained within the image,leading to a decline in detection accuracy.Moreover,the sensitivity of small objects to the bounding box perturbation further increases the detection difficulty.In this paper,we introduce a novel approach,Cross-Layer Fusion and Weighted Receptive Field-based YOLO(CAW-YOLO),specifically designed for small object detection in remote sensing.To address feature loss in deep layers,we have devised a cross-layer attention fusion module.Background noise is effectively filtered through the incorporation of Bi-Level Routing Attention(BRA).To enhance the model’s capacity to perceive multi-scale objects,particularly small-scale objects,we introduce a weightedmulti-receptive field atrous spatial pyramid poolingmodule.Furthermore,wemitigate the sensitivity arising from bounding box perturbation by incorporating the joint Normalized Wasserstein Distance(NWD)and Efficient Intersection over Union(EIoU)losses.The efficacy of the proposedmodel in detecting small objects in remote sensing has been validated through experiments conducted on three publicly available datasets.The experimental results unequivocally demonstrate the model’s pronounced advantages in small object detection for remote sensing,surpassing the performance of current mainstream models.
基金the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+2 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211).
文摘Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.
文摘Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.
基金This project was supported by the National Natural Science Foundation of China under grant U1836208the Hunan Provincial Natural Science Foundations of China under Grant 2020JJ4626+2 种基金the Scientific Research Fund of Hunan Provincial Education Department of China under Grant 19B004the“Double First-class”International Cooperation and Development Scientific Research Project of Changsha University of Science and Technology under Grant 2018IC25the Young Teacher Growth Plan Project of Changsha University of Science and Technology under Grant 2019QJCZ076.
文摘In the past several years,remarkable achievements have been made in the field of object detection.Although performance is generally improving,the accuracy of small object detection remains low compared with that of large object detection.In addition,localization misalignment issues are common for small objects,as seen in GoogLeNets and residual networks(ResNets).To address this problem,we propose an improved region-based fully convolutional network(R-FCN).The presented technique improves detection accuracy and eliminates localization misalignment by replacing positionsensitive region of interest(PS-RoI)pooling with position-sensitive precise region of interest(PS-Pr-RoI)pooling,which avoids coordinate quantization and directly calculates two-order integrals for position-sensitive score maps,thus preventing a loss of spatial precision.A validation experiment was conducted in which the Microsoft common objects in context(MS COCO)training dataset was oversampled.Results showed an accuracy improvement of 3.7%for object detection tasks and an increase of 6.0%for small objects.
基金the National Natural Science Foundation of China under grant 62172059 and 62072055Hunan Provincial Natural Science Foundations of China under Grant 2020JJ4626+2 种基金Scientific Research Fund of Hunan Provincial Education Department of China under Grant 19B004“Double First-class”International Cooperation and Development Scientific Research Project of Changsha University of Science and Technology under Grant 2018IC25the Young Teacher Growth Plan Project of Changsha University of Science and Technology under Grant 2019QJCZ076.
文摘Recently,object detection based on convolutional neural networks(CNNs)has developed rapidly.The backbone networks for basic feature extraction are an important component of the whole detection task.Therefore,we present a new feature extraction strategy in this paper,which name is DSAFF-Net.In this strategy,we design:1)a sandwich attention feature fusion module(SAFF module).Its purpose is to enhance the semantic information of shallow features and resolution of deep features,which is beneficial to small object detection after feature fusion.2)to add a new stage called D-block to alleviate the disadvantages of decreasing spatial resolution when the pooling layer increases the receptive field.The method proposed in the new stage replaces the original method of obtaining the P6 feature map and uses the result as the input of the regional proposal network(RPN).In the experimental phase,we use the new strategy to extract features.The experiment takes the public dataset of Microsoft Common Objects in Context(MS COCO)object detection and the dataset of Corona Virus Disease 2019(COVID-19)image classification as the experimental object respectively.The results show that the average recognition accuracy of COVID-19 in the classification dataset is improved to 98.163%,and small object detection in object detection tasks is improved by 4.0%.
文摘The detection of large-scale objects has achieved high accuracy,but due to the low peak signal to noise ratio(PSNR),fewer distinguishing features,and ease of being occluded by the surroundings,the detection of small objects,however,does not enjoy similar success.Endeavor to solve the problem,this paper proposes an attention mechanism based on cross-Key values.Based on the traditional transformer,this paper first improves the feature processing with the convolution module,effectively maintaining the local semantic context in the middle layer,and significantly reducing the number of parameters of the model.Then,to enhance the effectiveness of the attention mask,two Key values are calculated simultaneously along Query and Value by using the method of dual-branch parallel processing,which is used to strengthen the attention acquisition mode and improve the coupling of key information.Finally,focusing on the feature maps of different channels,the multi-head attention mechanism is applied to the channel attention mask to improve the feature utilization effect of the middle layer.By comparing three small object datasets,the plug-and-play interactive transformer(IT-transformer)module designed by us effectively improves the detection results of the baseline.
基金supported in part by the Joint Fund of the Ministry of Education for Equipment Pre-Research of China under Grant No.8091B032257the National Natural Science Foundation of China under Grant Nos.62106232 and 62372415+1 种基金the China Postdoctoral Science Foundation under Grant No.2021TQ0301the Outstanding Youth Science Fund of Henan Province of China under Grant No.242300421050.
文摘Knowledge distillation is often used for model compression and has achieved a great breakthrough in image classification,but there still remains scope for improvement in object detection,especially for knowledge extraction of small objects.The main problem is the features of small objects are often polluted by background noise and not prominent due to down-sampling of convolutional neural network(CNN),resulting in the insufficient refinement of small object features during distillation.In this paper,we propose Hierarchical Matching Knowledge Distillation Network(HMKD)that operates on the pyramid level P2 to pyramid level P4 of the feature pyramid network(FPN),aiming to intervene on small object features before affecting.We employ an encoder-decoder network to encapsulate low-resolution,highly semantic information,akin to eliciting insights from profound strata within a teacher network,and then match the encapsulated information with high-resolution feature values of small objects from shallow layers as the key.During this period,we use an attention mechanism to measure the relevance of the inquiry to the feature values.Also in the process of decoding,knowledge is distilled to the student.In addition,we introduce a supplementary distillation module to mitigate the effects of background noise.Experiments show that our method achieves excellent improvements for both one-stage and twostage object detectors.Specifically,applying the proposed method on Faster R-CNN achieves 41.7%mAP on COCO2017(ResNet50 as the backbone),which is 3.8%higher than that of the baseline.
文摘Infrared small target detection is a common task in infrared image processing.Under limited computa⁃tional resources.Traditional methods for infrared small target detection face a trade-off between the detection rate and the accuracy.A fast infrared small target detection method tailored for resource-constrained conditions is pro⁃posed for the YOLOv5s model.This method introduces an additional small target detection head and replaces the original Intersection over Union(IoU)metric with Normalized Wasserstein Distance(NWD),while considering both the detection accuracy and the detection speed of infrared small targets.Experimental results demonstrate that the proposed algorithm achieves a maximum effective detection speed of 95 FPS on a 15 W TPU,while reach⁃ing a maximum effective detection accuracy of 91.9 AP@0.5,effectively improving the efficiency of infrared small target detection under resource-constrained conditions.
文摘Object detection has been studied for many years.The convolutional neural network has made great progress in the accuracy and speed of object detection.However,due to the low resolution of small objects and the representation of fuzzy features,one of the challenges now is how to effectively detect small objects in images.Existing target detectors for small objects:one is to use high-resolution images as input,the other is to increase the depth of the CNN network,but these two methods will undoubtedly increase the cost of calculation and time-consuming.In this paper,based on the RefineDet network framework,we propose our network structure RF2Det by introducing Receptive Field Block to solve the problem of small object detection,so as to achieve the balance of speed and accuracy.At the same time,we propose a Medium-level Feature Pyramid Networks,which combines appropriate high-level context features with low-level features,so that the network can use the features of both the low-level and the high-level for multi-scale target detection,and the accuracy of the small target detection task based on the low-level features is improved.Extensive experiments on the MS COCO dataset demonstrate that compared to other most advanced methods,our proposed method shows significant performance improvement in the detection of small objects.
基金Supported by the National Center ATC Surveillance and Communication System Engineering Research。
文摘With the advancement of society and science and technology, the demand for detecting small objects in practical scenarios becomes stronger. Such objects are only represented by relatively small coverage of pixels, and the features are degraded severely after being extracted by a deep convolutional neural network, which is detrimental to the detection performance for small objects. Therefore, an intuitive solution is to increase the resolution of small objects by cropping the original image. In this paper, we propose a simple but effective object density map guided region localization module (DMGRL) to locate and crop the regions of interest where small objects may exist. Firstly, the density map of the objects is estimated by object density map estimation network, and then the coordinates of the small object regions are calculated;Secondly, the continuous differentiable affine transformation is utilized to crop these regions so that the detector with DMGRL can be trained end-to-end instead of two-stage training. Finally, the all prediction results of input image and cropped region images are merged together to output the final detection results by non maximum suppression (NMS). Extensive experiments demonstrate the superior performance of the detector incorporated DMGRL.
基金National Defense Pre-research Fund Project(No.KMGY318002531)。
文摘In order to solve the problem of small objects detection in unmanned aerial vehicle(UAV)aerial images with complex background,a general detection method for multi-scale small objects based on Faster region-based convolutional neural network(Faster R-CNN)is proposed.The bird’s nest on the high-voltage tower is taken as the research object.Firstly,we use the improved convolutional neural network ResNet101 to extract object features,and then use multi-scale sliding windows to obtain the object region proposals on the convolution feature maps with different resolutions.Finally,a deconvolution operation is added to further enhance the selected feature map with higher resolution,and then it taken as a feature mapping layer of the region proposals passing to the object detection sub-network.The detection results of the bird’s nest in UAV aerial images show that the proposed method can precisely detect small objects in aerial images.
基金supported by the National Natural Science Foundation of China (60903126)the China Postdoctoral Special Science Foundation (201003685)+1 种基金the China Postdoctoral Science Foundation (20090451397)the Northwestern Polytechnical University Foundation for Fundamental Research (JC201120)
文摘It is known that detecting small moving objects in as- tronomical image sequences is a significant research problem in space surveillance. The new theory, compressive sensing, pro- vides a very easy and computationally cheap coding scheme for onboard astronomical remote sensing. An algorithm for small moving space object detection and localization is proposed. The algorithm determines the measurements of objects by comparing the difference between the measurements of the current image and the measurements of the background scene. In contrast to reconstruct the whole image, only a foreground image is recon- structed, which will lead to an effective computational performance, and a high level of localization accuracy is achieved. Experiments and analysis are provided to show the performance of the pro- posed approach on detection and localization.
基金the Liaoning Provincial Department of Education 2021 Annual Scientific Research Funding Program(Grant Numbers LJKZ0535,LJKZ0526)the 2021 Annual Comprehensive Reform of Undergraduate Education Teaching(Grant Numbers JGLX2021020,JCLX2021008)Graduate Innovation Fund of Dalian Polytechnic University(Grant Number 2023CXYJ13).
文摘In pursuit of cost-effective manufacturing,enterprises are increasingly adopting the practice of utilizing recycled semiconductor chips.To ensure consistent chip orientation during packaging,a circular marker on the front side is employed for pin alignment following successful functional testing.However,recycled chips often exhibit substantial surface wear,and the identification of the relatively small marker proves challenging.Moreover,the complexity of generic target detection algorithms hampers seamless deployment.Addressing these issues,this paper introduces a lightweight YOLOv8s-based network tailored for detecting markings on recycled chips,termed Van-YOLOv8.Initially,to alleviate the influence of diminutive,low-resolution markings on the precision of deep learning models,we utilize an upscaling approach for enhanced resolution.This technique relies on the Super-Resolution Generative Adversarial Network with Extended Training(SRGANext)network,facilitating the reconstruction of high-fidelity images that align with input specifications.Subsequently,we replace the original YOLOv8smodel’s backbone feature extraction network with the lightweight VanillaNetwork(VanillaNet),simplifying the branch structure to reduce network parameters.Finally,a Hybrid Attention Mechanism(HAM)is implemented to capture essential details from input images,improving feature representation while concurrently expediting model inference speed.Experimental results demonstrate that the Van-YOLOv8 network outperforms the original YOLOv8s on a recycled chip dataset in various aspects.Significantly,it demonstrates superiority in parameter count,computational intricacy,precision in identifying targets,and speed when compared to certain prevalent algorithms in the current landscape.The proposed approach proves promising for real-time detection of recycled chips in practical factory settings.
文摘Aiming at solving the problem of missed detection and low accuracy in detecting traffic signs in the wild, an improved method of YOLOv8 is proposed. Firstly, combined with the characteristics of small target objects in the actual scene, this paper further adds blur and noise operation. Then, the asymptotic feature pyramid network (AFPN) is introduced to highlight the influence of key layer features after feature fusion, and simultaneously solve the direct interaction of non-adjacent layers. Experimental results on the TT100K dataset show that compared with the YOLOv8, the detection accuracy and recall are higher. .
基金funded by the Research Fund for Foundation of Hebei University(DXK201914)the President of Hebei University(XZJJ201914)+1 种基金the Post-graduate’s Innovation Fund Project of Hebei University(HBU2022SS003)the Special Project for Cultivating College Students’Scientific and Technological Innovation Ability in Hebei Province(22E50041D).
文摘Due to small size and high occult,metacarpophalangeal fracturediagnosis displays a low accuracy in terms of fracture detection and locationin X-ray images.To efficiently detect metacarpophalangeal fractures on Xrayimages as the second opinion for radiologists,we proposed a novel onestageneural network namedMPFracNet based onRetinaNet.InMPFracNet,a deformable bottleneck block(DBB)was integrated into the bottleneckto better adapt to the geometric variation of the fractures.Furthermore,an integrated feature fusion module(IFFM)was employed to obtain morein-depth semantic and shallow detail features.Specifically,Focal Loss andBalanced L1 Loss were introduced to respectively attenuate the imbalancebetween positive and negative classes and the imbalance between detectionand location tasks.We assessed the proposed model on the test set andachieved an AP of 80.4%for the metacarpophalangeal fracture detection.To estimate the detection performance for fractures with different difficulties,the proposed model was tested on the subsets of metacarpal,phalangeal andtiny fracture test sets and achieved APs of 82.7%,78.5%and 74.9%,respectively.Our proposed framework has state-of-the-art performance for detectingmetacarpophalangeal fractures,which has a strong potential application valuein practical clinical environments.
基金supported by the Natural Science Foundation of Liaoning Province(No.2022-MS-353)Basic Scientific Research Project of Education Department of Liaoning Province(Nos.2020LNZD06 and LJKMZ20220640)。
文摘Surface defects can affect the quality of steel plate.Many methods based on computer vision are currently applied to surface defect detection of steel plate.However,their real-time performance and object detection of small defect are still unsatisfactory.An improved object detection network based on You Only Look One-level Feature(YOLOF)is proposed to show excellent performance in surface defect detection of steel plate,called DLF-YOLOF.First,the anchor-free detector is used to reduce the network hyperparameters.Secondly,deformable convolution network and local spatial attention module are introduced into the feature extraction network to increase the contextual information in the feature maps.Also,the soft non-maximum suppression is used to improve detection accuracy significantly.Finally,data augmentation is performed for small defect objects during training to improve detection accuracy.Experiments show the average precision and average precision for small objects are 42.7%and 33.5%at a detection speed of 62 frames per second on a single GPU,respectively.This shows that DLF-YOLOF has excellent performance to meet the needs of industrial real-time detection.
文摘To solve problems such as the low detection accuracy of helmet wear-ing,missing detection and poor real-time performance of embedded equipment in the scene of remote and small targets at the construction site,the text proposes an improved YOLO v5 for small target helmet wearing detection.Based on YOLO v5,the self-attention transformer mechanism and swin transformer module are introduced in the feature fusion step to increase the receptivefield of the con-volution kernel and globally model the high-level semantic feature information extracted from the backbone network to make the model more focused on hel-met feature learning.Replace some convolution operators with lighter and more efficient Involution operators to reduce the number of parameters.The connection mode of the Concat is improved,and 1×1 convolution is added.The experimental results compared with YOLO v5 show that the size of the improved helmet detec-tion model is reduced by 17.8%occupying only 33.2 MB,FPS increased by 5%,and mAP@0.5 reached 94.9%.This approach effectively improves the accuracy of small target helmet wear detection,and meets the deployment requirements for low computational power embedded devices.
文摘Smoking is the main reason for fire disaster and pollution in petrol station,construction site and warehouse.Existing solutions based on wearable devices and smoking sensors were costly and hard to obtain evidence of smoking in unmanned scenarios.With the developments of closed circuit television(CCTV)system,vision-based methods for object detection,mostly driven by deep learning techniques,were introduced recently.However,the massive GPU computing hardware required by the deep learning algorithm made these methods hard to be deployed.This paper aims at solving the smoking detection problem on edge and proposes the solution that has fast detection speed,high accuracy on micro-objects and low computing budget,i.e.,it could be deployed on the edge device such as NVIDIA JETSON TX2.We designed a new framework named RTVBS based on yolov3 and made a smoking dataset to train our model.We raised several methods to improve detection accuracy during the training step.The validation results show our model has excellent performance in smoking detection.
基金supported by the National Natural Science Foundation of China (No.61772298)Research Grant of Beijing Higher Institution Engineering Research Centerthe Tsinghua–Tencent Joint Laboratory for Internet Innovation Technology
文摘Detecting small objects is a challenging task.We focus on a special case:the detection and classification of traffic signals in street views.We present a novel framework that utilizes a visual attention model to make detection more efficient,without loss of accuracy,and which generalizes.The attention model is designed to generate a small set of candidate regions at a suitable scale so that small targets can be better located and classified.In order to evaluate our method in the context of traffic signal detection,we have built a traffic light benchmark with over 15,000 traffic light instances,based on Tencent street view panoramas.We have tested our method both on the dataset we have built and the Tsinghua–Tencent 100K(TT100K)traffic sign benchmark.Experiments show that our method has superior detection performance and is quicker than the general faster RCNN object detection framework on both datasets.It is competitive with state-of-theart specialist traffic sign detectors on TT100K,but is an order of magnitude faster.To show generality,we tested it on the LISA dataset without tuning,and obtained an average precision in excess of 90%.