Insulator defect detection plays a vital role in maintaining the secure operation of power systems.To address the issues of the difficulty of detecting small objects and missing objects due to the small scale,variable...Insulator defect detection plays a vital role in maintaining the secure operation of power systems.To address the issues of the difficulty of detecting small objects and missing objects due to the small scale,variable scale,and fuzzy edge morphology of insulator defects,we construct an insulator dataset with 1600 samples containing flashovers and breakages.Then a simple and effective surface defect detection method of power line insulators for difficult small objects is proposed.Firstly,a high-resolution featuremap is introduced and a small object prediction layer is added so that the model can detect tiny objects.Secondly,a simplified adaptive spatial feature fusion(SASFF)module is introduced to perform cross-scale spatial fusion to improve adaptability to variable multi-scale features.Finally,we propose an enhanced deformable attention mechanism(EDAM)module.By integrating a gating activation function,the model is further inspired to learn a small number of critical sampling points near reference points.And the module can improve the perception of object morphology.The experimental results indicate that concerning the dataset of flashover and breakage defects,this method improves the performance of YOLOv5,YOLOv7,and YOLOv8.In practical application,it can simply and effectively improve the precision of power line insulator defect detection and reduce missing detection for difficult small objects.展开更多
Unmanned aerial vehicles(UAVs) have gained significant attention in practical applications, especially the low-altitude aerial(LAA) object detection imposes stringent requirements on recognition accuracy and computati...Unmanned aerial vehicles(UAVs) have gained significant attention in practical applications, especially the low-altitude aerial(LAA) object detection imposes stringent requirements on recognition accuracy and computational resources. In this paper, the LAA images-oriented tensor decomposition and knowledge distillation-based network(TDKD-Net) is proposed,where the TT-format TD(tensor decomposition) and equalweighted response-based KD(knowledge distillation) methods are designed to minimize redundant parameters while ensuring comparable performance. Moreover, some robust network structures are developed, including the small object detection head and the dual-domain attention mechanism, which enable the model to leverage the learned knowledge from small-scale targets and selectively focus on salient features. Considering the imbalance of bounding box regression samples and the inaccuracy of regression geometric factors, the focal and efficient IoU(intersection of union) loss with optimal transport assignment(F-EIoU-OTA)mechanism is proposed to improve the detection accuracy. The proposed TDKD-Net is comprehensively evaluated through extensive experiments, and the results have demonstrated the effectiveness and superiority of the developed methods in comparison to other advanced detection algorithms, which also present high generalization and strong robustness. As a resource-efficient precise network, the complex detection of small and occluded LAA objects is also well addressed by TDKD-Net, which provides useful insights on handling imbalanced issues and realizing domain adaptation.展开更多
Introduction: Cranioencephalic trauma caused by bladed weapons is rare, and that caused by sharp objects is exceptional. The aim of our study was to describe the clinical, therapeutic and evolutionary aspects. Materia...Introduction: Cranioencephalic trauma caused by bladed weapons is rare, and that caused by sharp objects is exceptional. The aim of our study was to describe the clinical, therapeutic and evolutionary aspects. Materials and method: This was a descriptive and analytical study over a 48-month period at CHU la Renaissance from January 1, 2018 to December 31, 2021, concerning patients admitted for penetrating cranioencephalic trauma by pointed object. Results: Twelve cases, all male, of penetrating cranioencephalic sharp-force trauma were identified. The mean age was 34 ± 7 years, with extremes of 11 and 60 years. Farmers and herders accounted for 31% and 25% of cases respectively. The average admission time was 47 hours. Brawls were the circumstances of occurrence in 81.2% of cases. Knives (33%), arrows (25%) and iron bars (16.6%) were the objects used. Altered consciousness was present in 43.8% of cases, and focal deficit in 50%. Scannographic lesions were fracture and/or embarrhment (12 cases), intra-parenchymal haematomas (6 cases) and presence of object in place (4 cases). Surgery was performed in 11 patients. Postoperative outcome was favorable in 9 patients. After 12 months, 2 patients were declared unfit. Conclusion: Penetrating head injuries caused by sharp objects are common in Chad. Urgent surgery can prevent disabling after-effects.展开更多
This article presents an analysis of the patterns of interactions resulting from the positive and negative emotional events that occur in cities,considering them as complex systems.It explores,from the imaginaries,how...This article presents an analysis of the patterns of interactions resulting from the positive and negative emotional events that occur in cities,considering them as complex systems.It explores,from the imaginaries,how certain urban objects can act as emotional agents and how these events affect the urban system as a whole.An adaptive complex systems perspective is used to analyze these patterns.The results show patterns in the processes and dynamics that occur in cities based on the objects that affect the emotions of the people who live there.These patterns depend on the characteristics of the emotional charge of urban objects,but they can be generalized in the following process:(1)immediate reaction by some individuals;(2)emotions are generated at the individual level which begins to generalize,permuting to a collective emotion;(3)a process of reflection is detonated in some individuals from the reading of collective emotions;(4)integration/significance in the community both at the individual and collective level,on the concepts,roles and/or functions that give rise to the process in the system.Therefore,it is clear that emotions play a significant role in the development of cities and these aspects should be considered in the design strategies of all kinds of projects for the city.Future extensions of this work could include a deeper analysis of specific emotional events in urban environments,as well as possible implications for urban policy and decision making.展开更多
In this paper,the open-sourced computational fluid dynamics software,OpenFOAM~?,is used to study the fluctuation phenomenon of the water body inside a horizontally one-dimensional enclosed harbor basin with constant w...In this paper,the open-sourced computational fluid dynamics software,OpenFOAM~?,is used to study the fluctuation phenomenon of the water body inside a horizontally one-dimensional enclosed harbor basin with constant water depth triggered by falling wedges with various horizontal falling positions,initial falling velocities and masses.Based on both Fourier transfo rm analysis and wavelet spectrum analysis for the time series of the free surface elevations inside the harbor basin,it is found for the first time that the wedge falling inside the harbor can directly trigger harbor resonance.The influences of the three factors(including the horizontal falling position,the initial falling velocity,and the mass)on the response amplitudes of the lowest three resonant modes are also investigated.The results show that when the wedge falls on one of the nodal points of a resonant mode,the mode would be remarkably suppressed.Conversely,when the wedge falls on one of the anti-nodal points of a resonant mode,the mode would be evidently triggered.The initial falling velocity of the wedge mainly has a remarkable effect on the response amplitude of the most significant mode,and the latter shows a gradual increase trend with the increase of the former.While for the other two less significant modes,their response amplitudes fluctuate around certain constant values as the initial falling velocity rises.In general,the response amplitudes of all the lowest three modes are shown to gradually increase with the mass of the wedge.展开更多
Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unman...Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.展开更多
The amount of needed control messages in wireless sensor networks(WSN)is affected by the storage strategy of detected events.Because broadcasting superfluous control messages consumes excess energy,the network lifespa...The amount of needed control messages in wireless sensor networks(WSN)is affected by the storage strategy of detected events.Because broadcasting superfluous control messages consumes excess energy,the network lifespan can be extended if the quantity of control messages is decreased.In this study,an optimized storage technique having low control overhead for tracking the objects in WSN is introduced.The basic concept is to retain observed events in internal memory and preserve the relationship between sensed information and sensor nodes using a novel inexpensive data structure entitled Ordered Binary Linked List(OBLL).Whenever an object passes over the sensor area,the recognizing sensor can immediately produce an OBLL along the object’s route.To retrieve the entire information,the OBLL can be traversed with logarithmic complexity which is much less than the traversing complexity of existing linked list structures.Performance evaluation and simulations were carried out to ensure that the suggested technique minimizes the number of messages and thus saving energy and extending the network life.展开更多
Three-dimensional(3D) scanning technology has undergone remarkable developments in recent years.Data acquired by 3D scanning have the form of 3D point clouds.The 3D scanned point clouds have data sizes that can be con...Three-dimensional(3D) scanning technology has undergone remarkable developments in recent years.Data acquired by 3D scanning have the form of 3D point clouds.The 3D scanned point clouds have data sizes that can be considered big data.They also contain measurement noise inherent in measurement data.These properties of 3D scanned point clouds make many traditional CG/visualization techniques difficult.This paper reviewed our recent achievements in developing varieties of high-quality visualizations suitable for the visual analysis of 3D scanned point clouds.We demonstrated the effectiveness of the method by applying the visualizations to various cultural heritage objects.The main visualization targets used in this paper are the floats in the Gion Festival in Kyoto(the float parade is on the UNESCO Intangible Cultural Heritage List) and Borobudur Temple in Indonesia(a UNESCO World Heritage Site).展开更多
The transmission of video content over a network raises various issues relating to copyright authenticity,ethics,legality,and privacy.The protection of copyrighted video content is a significant issue in the video ind...The transmission of video content over a network raises various issues relating to copyright authenticity,ethics,legality,and privacy.The protection of copyrighted video content is a significant issue in the video industry,and it is essential to find effective solutions to prevent tampering and modification of digital video content during its transmission through digital media.However,there are stillmany unresolved challenges.This paper aims to address those challenges by proposing a new technique for detectingmoving objects in digital videos,which can help prove the credibility of video content by detecting any fake objects inserted by hackers.The proposed technique involves using two methods,the H.264 and the extraction color features methods,to embed and extract watermarks in video frames.The study tested the performance of the system against various attacks and found it to be robust.The evaluation was done using different metrics such as Peak-Signal-to-Noise Ratio(PSNR),Mean Squared Error(MSE),Structural Similarity Index Measure(SSIM),Bit Correction Ratio(BCR),and Normalized Correlation.The accuracy of identifying moving objects was high,ranging from 96.3%to 98.7%.The system was also able to embed a fragile watermark with a success rate of over 93.65%and had an average capacity of hiding of 78.67.The reconstructed video frames had high quality with a PSNR of at least 65.45 dB and SSIMof over 0.97,making them imperceptible to the human eye.The system also had an acceptable average time difference(T=1.227/s)compared with other state-of-the-art methods.展开更多
Advances in machine vision systems have revolutionized applications such as autonomous driving,robotic navigation,and augmented reality.Despite substantial progress,challenges persist,including dynamic backgrounds,occ...Advances in machine vision systems have revolutionized applications such as autonomous driving,robotic navigation,and augmented reality.Despite substantial progress,challenges persist,including dynamic backgrounds,occlusion,and limited labeled data.To address these challenges,we introduce a comprehensive methodology toenhance image classification and object detection accuracy.The proposed approach involves the integration ofmultiple methods in a complementary way.The process commences with the application of Gaussian filters tomitigate the impact of noise interference.These images are then processed for segmentation using Fuzzy C-Meanssegmentation in parallel with saliency mapping techniques to find the most prominent regions.The Binary RobustIndependent Elementary Features(BRIEF)characteristics are then extracted fromdata derived fromsaliency mapsand segmented images.For precise object separation,Oriented FAST and Rotated BRIEF(ORB)algorithms areemployed.Genetic Algorithms(GAs)are used to optimize Random Forest classifier parameters which lead toimproved performance.Our method stands out due to its comprehensive approach,adeptly addressing challengessuch as changing backdrops,occlusion,and limited labeled data concurrently.A significant enhancement hasbeen achieved by integrating Genetic Algorithms(GAs)to precisely optimize parameters.This minor adjustmentnot only boosts the uniqueness of our system but also amplifies its overall efficacy.The proposed methodologyhas demonstrated notable classification accuracies of 90.9%and 89.0%on the challenging Corel-1k and MSRCdatasets,respectively.Furthermore,detection accuracies of 87.2%and 86.6%have been attained.Although ourmethod performed well in both datasets it may face difficulties in real-world data especially where datasets havehighly complex backgrounds.Despite these limitations,GAintegration for parameter optimization shows a notablestrength in enhancing the overall adaptability and performance of our system.展开更多
Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects an...Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects and the high cost associated with data collection.Consequently,devising algorithms capable of accurately localizing specific objects within a scene in scenarios where annotated data is limited remains a formidable challenge.To solve this problem,this paper proposes an object discovery by request problem setting and a corresponding algorithmic framework.The proposed problem setting aims to identify specified objects in scenes,and the associated algorithmic framework comprises pseudo data generation and object discovery by request network.Pseudo-data generation generates images resembling natural scenes through various data augmentation rules,using a small number of object samples and scene images.The network structure of object discovery by request utilizes the pre-trained Vision Transformer(ViT)model as the backbone,employs object-centric methods to learn the latent representations of foreground objects,and applies patch-level reconstruction constraints to the model.During the validation phase,we use the generated pseudo datasets as training sets and evaluate the performance of our model on the original test sets.Experiments have proved that our method achieves state-of-the-art performance on Unmanned Aerial Vehicles-Bottle Detection(UAV-BD)dataset and self-constructed dataset Bottle,especially in multi-object scenarios.展开更多
Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain les...Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.展开更多
The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection ...The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection in the field of blasting.Serverless Computing can provide a variety of computing services for people without hardware foundations and rich software development experience,which has aroused people’s interest in how to use it in the field ofmachine learning.In this paper,we design a distributedmachine learning training application based on the AWS Lambda platform.Based on data parallelism,the data aggregation and training synchronization in Function as a Service(FaaS)are effectively realized.It also encrypts the data set,effectively reducing the risk of data leakage.We rent a cloud server and a Lambda,and then we conduct experiments to evaluate our applications.Our results indicate the effectiveness,rapidity,and economy of distributed training on FaaS.展开更多
Visual object tracking plays a crucial role in computer vision.In recent years,researchers have proposed various methods to achieve high-performance object tracking.Among these,methods based on Transformers have becom...Visual object tracking plays a crucial role in computer vision.In recent years,researchers have proposed various methods to achieve high-performance object tracking.Among these,methods based on Transformers have become a research hotspot due to their ability to globally model and contextualize information.However,current Transformer-based object tracking methods still face challenges such as low tracking accuracy and the presence of redundant feature information.In this paper,we introduce self-calibration multi-head self-attention Transformer(SMSTracker)as a solution to these challenges.It employs a hybrid tensor decomposition self-organizing multihead self-attention transformermechanism,which not only compresses and accelerates Transformer operations but also significantly reduces redundant data,thereby enhancing the accuracy and efficiency of tracking.Additionally,we introduce a self-calibration attention fusion block to resolve common issues of attention ambiguities and inconsistencies found in traditional trackingmethods,ensuring the stability and reliability of tracking performance across various scenarios.By integrating a hybrid tensor decomposition approach with a self-organizingmulti-head self-attentive transformer mechanism,SMSTracker enhances the efficiency and accuracy of the tracking process.Experimental results show that SMSTracker achieves competitive performance in visual object tracking,promising more robust and efficient tracking systems,demonstrating its potential to providemore robust and efficient tracking solutions in real-world applications.展开更多
In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,...In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,diagnosis and evaluation of kidney and urinary tract disease,providing insight into the specific type and severity.However,manual urine sediment examination is labor-intensive,time-consuming,and subjective.Traditional machine learning based object detection methods require hand-crafted features for localization and classification,which have poor generalization capabilities and are difficult to quickly and accurately detect the number of urine sediments.Deep learning based object detection methods have the potential to address the challenges mentioned above,but these methods require access to large urine sediment image datasets.Unfortunately,only a limited number of publicly available urine sediment datasets are currently available.To alleviate the lack of urine sediment datasets in medical image analysis,we propose a new dataset named UriSed2K,which contains 2465 high-quality images annotated with expert guidance.Two main challenges are associated with our dataset:a large number of small objects and the occlusion between these small objects.Our manuscript focuses on applying deep learning object detection methods to the urine sediment dataset and addressing the challenges presented by this dataset.Specifically,our goal is to improve the accuracy and efficiency of the detection algorithm and,in doing so,provide medical professionals with an automatic detector that saves time and effort.We propose an improved lightweight one-stage object detection algorithm called Discriminatory-YOLO.The proposed algorithm comprises a local context attention module and a global background suppression module,which aid the detector in distinguishing urine sediment features in the image.The local context attention module captures context information beyond the object region,while the global background suppression module emphasizes objects in uninformative backgrounds.We comprehensively evaluate our method on the UriSed2K dataset,which includes seven categories of urine sediments,such as erythrocytes(red blood cells),leukocytes(white blood cells),epithelial cells,crystals,mycetes,broken erythrocytes,and broken leukocytes,achieving the best average precision(AP)of 95.3%while taking only 10 ms per image.The source code and dataset are available at https://github.com/binghuiwu98/discriminatoryyolov5.展开更多
Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of labelling.However,there is a large performance gap between weakly supervised and fully superv...Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of labelling.However,there is a large performance gap between weakly supervised and fully supervised salient object detectors because the scribble annotation can only provide very limited foreground/background information.Therefore,an intuitive idea is to infer annotations that cover more complete object and background regions for training.To this end,a label inference strategy is proposed based on the assumption that pixels with similar colours and close positions should have consistent labels.Specifically,k-means clustering algorithm was first performed on both colours and coordinates of original annotations,and then assigned the same labels to points having similar colours with colour cluster centres and near coordinate cluster centres.Next,the same annotations for pixels with similar colours within each kernel neighbourhood was set further.Extensive experiments on six benchmarks demonstrate that our method can significantly improve the performance and achieve the state-of-the-art results.展开更多
The advancement of navigation systems for the visually impaired has significantly enhanced their mobility by mitigating the risk of encountering obstacles and guiding them along safe,navigable routes.Traditional appro...The advancement of navigation systems for the visually impaired has significantly enhanced their mobility by mitigating the risk of encountering obstacles and guiding them along safe,navigable routes.Traditional approaches primarily focus on broad applications such as wayfinding,obstacle detection,and fall prevention.However,there is a notable discrepancy in applying these technologies to more specific scenarios,like identifying distinct food crop types or recognizing faces.This study proposes a real-time application designed for visually impaired individuals,aiming to bridge this research-application gap.It introduces a system capable of detecting 20 different food crop types and recognizing faces with impressive accuracies of 83.27%and 95.64%,respectively.These results represent a significant contribution to the field of assistive technologies,providing visually impaired users with detailed and relevant information about their surroundings,thereby enhancing their mobility and ensuring their safety.Additionally,it addresses the vital aspects of social engagements,acknowledging the challenges faced by visually impaired individuals in recognizing acquaintances without auditory or tactile signals,and highlights recent developments in prototype systems aimed at assisting with face recognition tasks.This comprehensive approach not only promises enhanced navigational aids but also aims to enrich the social well-being and safety of visually impaired communities.展开更多
Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false...Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false detection rates when applying object recognition algorithms tailored for remote sensing imagery.Additionally,these complexities contribute to inaccuracies in target localization and hinder precise target categorization.This paper addresses these challenges by proposing a solution:The YOLO-MFD model(YOLO-MFD:Remote Sensing Image Object Detection withMulti-scale Fusion Dynamic Head).Before presenting our method,we delve into the prevalent issues faced in remote sensing imagery analysis.Specifically,we emphasize the struggles of existing object recognition algorithms in comprehensively capturing critical image features amidst varying scales and complex backgrounds.To resolve these issues,we introduce a novel approach.First,we propose the implementation of a lightweight multi-scale module called CEF.This module significantly improves the model’s ability to comprehensively capture important image features by merging multi-scale feature information.It effectively addresses the issues of missed detection and mistaken alarms that are common in remote sensing imagery.Second,an additional layer of small target detection heads is added,and a residual link is established with the higher-level feature extraction module in the backbone section.This allows the model to incorporate shallower information,significantly improving the accuracy of target localization in remotely sensed images.Finally,a dynamic head attentionmechanism is introduced.This allows themodel to exhibit greater flexibility and accuracy in recognizing shapes and targets of different sizes.Consequently,the precision of object detection is significantly improved.The trial results show that the YOLO-MFD model shows improvements of 6.3%,3.5%,and 2.5%over the original YOLOv8 model in Precision,map@0.5 and map@0.5:0.95,separately.These results illustrate the clear advantages of the method.展开更多
Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variati...Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.展开更多
基金State Grid Jiangsu Electric Power Co.,Ltd.of the Science and Technology Project(Grant No.J2022004).
文摘Insulator defect detection plays a vital role in maintaining the secure operation of power systems.To address the issues of the difficulty of detecting small objects and missing objects due to the small scale,variable scale,and fuzzy edge morphology of insulator defects,we construct an insulator dataset with 1600 samples containing flashovers and breakages.Then a simple and effective surface defect detection method of power line insulators for difficult small objects is proposed.Firstly,a high-resolution featuremap is introduced and a small object prediction layer is added so that the model can detect tiny objects.Secondly,a simplified adaptive spatial feature fusion(SASFF)module is introduced to perform cross-scale spatial fusion to improve adaptability to variable multi-scale features.Finally,we propose an enhanced deformable attention mechanism(EDAM)module.By integrating a gating activation function,the model is further inspired to learn a small number of critical sampling points near reference points.And the module can improve the perception of object morphology.The experimental results indicate that concerning the dataset of flashover and breakage defects,this method improves the performance of YOLOv5,YOLOv7,and YOLOv8.In practical application,it can simply and effectively improve the precision of power line insulator defect detection and reduce missing detection for difficult small objects.
基金supported in part by the National Natural Science Foundation of China (62073271)the Natural Science Foundation for Distinguished Young Scholars of the Fujian Province of China (2023J06010)the Fundamental Research Funds for the Central Universities of China(20720220076)。
文摘Unmanned aerial vehicles(UAVs) have gained significant attention in practical applications, especially the low-altitude aerial(LAA) object detection imposes stringent requirements on recognition accuracy and computational resources. In this paper, the LAA images-oriented tensor decomposition and knowledge distillation-based network(TDKD-Net) is proposed,where the TT-format TD(tensor decomposition) and equalweighted response-based KD(knowledge distillation) methods are designed to minimize redundant parameters while ensuring comparable performance. Moreover, some robust network structures are developed, including the small object detection head and the dual-domain attention mechanism, which enable the model to leverage the learned knowledge from small-scale targets and selectively focus on salient features. Considering the imbalance of bounding box regression samples and the inaccuracy of regression geometric factors, the focal and efficient IoU(intersection of union) loss with optimal transport assignment(F-EIoU-OTA)mechanism is proposed to improve the detection accuracy. The proposed TDKD-Net is comprehensively evaluated through extensive experiments, and the results have demonstrated the effectiveness and superiority of the developed methods in comparison to other advanced detection algorithms, which also present high generalization and strong robustness. As a resource-efficient precise network, the complex detection of small and occluded LAA objects is also well addressed by TDKD-Net, which provides useful insights on handling imbalanced issues and realizing domain adaptation.
文摘Introduction: Cranioencephalic trauma caused by bladed weapons is rare, and that caused by sharp objects is exceptional. The aim of our study was to describe the clinical, therapeutic and evolutionary aspects. Materials and method: This was a descriptive and analytical study over a 48-month period at CHU la Renaissance from January 1, 2018 to December 31, 2021, concerning patients admitted for penetrating cranioencephalic trauma by pointed object. Results: Twelve cases, all male, of penetrating cranioencephalic sharp-force trauma were identified. The mean age was 34 ± 7 years, with extremes of 11 and 60 years. Farmers and herders accounted for 31% and 25% of cases respectively. The average admission time was 47 hours. Brawls were the circumstances of occurrence in 81.2% of cases. Knives (33%), arrows (25%) and iron bars (16.6%) were the objects used. Altered consciousness was present in 43.8% of cases, and focal deficit in 50%. Scannographic lesions were fracture and/or embarrhment (12 cases), intra-parenchymal haematomas (6 cases) and presence of object in place (4 cases). Surgery was performed in 11 patients. Postoperative outcome was favorable in 9 patients. After 12 months, 2 patients were declared unfit. Conclusion: Penetrating head injuries caused by sharp objects are common in Chad. Urgent surgery can prevent disabling after-effects.
文摘This article presents an analysis of the patterns of interactions resulting from the positive and negative emotional events that occur in cities,considering them as complex systems.It explores,from the imaginaries,how certain urban objects can act as emotional agents and how these events affect the urban system as a whole.An adaptive complex systems perspective is used to analyze these patterns.The results show patterns in the processes and dynamics that occur in cities based on the objects that affect the emotions of the people who live there.These patterns depend on the characteristics of the emotional charge of urban objects,but they can be generalized in the following process:(1)immediate reaction by some individuals;(2)emotions are generated at the individual level which begins to generalize,permuting to a collective emotion;(3)a process of reflection is detonated in some individuals from the reading of collective emotions;(4)integration/significance in the community both at the individual and collective level,on the concepts,roles and/or functions that give rise to the process in the system.Therefore,it is clear that emotions play a significant role in the development of cities and these aspects should be considered in the design strategies of all kinds of projects for the city.Future extensions of this work could include a deeper analysis of specific emotional events in urban environments,as well as possible implications for urban policy and decision making.
基金financially supported by the National Natural Science Foundation of China (Grant No.51911530205)the Natural Science Foundation of Jiangsu Province (Grant No.BK20201455)+5 种基金the Guangdong Basic and Applied Basic Research Foundation (Grant No.2023A1515010890)the Key Laboratory of PortWaterway and Sedimentation Engineering of MOT (Grant No.YK222001-2)the Open Research Fund of Key Laboratory of Water Security Guarantee in Guangdong-Hong Kong-Marco Greater Bay Area of Ministry of Water Resources (Grant No.WSGBAKJ202309)the Qing Lan Project of Jiangsu Universitiesthe Royal Society (Grant No.IECNSFC181321)。
文摘In this paper,the open-sourced computational fluid dynamics software,OpenFOAM~?,is used to study the fluctuation phenomenon of the water body inside a horizontally one-dimensional enclosed harbor basin with constant water depth triggered by falling wedges with various horizontal falling positions,initial falling velocities and masses.Based on both Fourier transfo rm analysis and wavelet spectrum analysis for the time series of the free surface elevations inside the harbor basin,it is found for the first time that the wedge falling inside the harbor can directly trigger harbor resonance.The influences of the three factors(including the horizontal falling position,the initial falling velocity,and the mass)on the response amplitudes of the lowest three resonant modes are also investigated.The results show that when the wedge falls on one of the nodal points of a resonant mode,the mode would be remarkably suppressed.Conversely,when the wedge falls on one of the anti-nodal points of a resonant mode,the mode would be evidently triggered.The initial falling velocity of the wedge mainly has a remarkable effect on the response amplitude of the most significant mode,and the latter shows a gradual increase trend with the increase of the former.While for the other two less significant modes,their response amplitudes fluctuate around certain constant values as the initial falling velocity rises.In general,the response amplitudes of all the lowest three modes are shown to gradually increase with the mass of the wedge.
基金This research was funded by the Natural Science Foundation of Hebei Province(F2021506004).
文摘Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.
文摘The amount of needed control messages in wireless sensor networks(WSN)is affected by the storage strategy of detected events.Because broadcasting superfluous control messages consumes excess energy,the network lifespan can be extended if the quantity of control messages is decreased.In this study,an optimized storage technique having low control overhead for tracking the objects in WSN is introduced.The basic concept is to retain observed events in internal memory and preserve the relationship between sensed information and sensor nodes using a novel inexpensive data structure entitled Ordered Binary Linked List(OBLL).Whenever an object passes over the sensor area,the recognizing sensor can immediately produce an OBLL along the object’s route.To retrieve the entire information,the OBLL can be traversed with logarithmic complexity which is much less than the traversing complexity of existing linked list structures.Performance evaluation and simulations were carried out to ensure that the suggested technique minimizes the number of messages and thus saving energy and extending the network life.
文摘Three-dimensional(3D) scanning technology has undergone remarkable developments in recent years.Data acquired by 3D scanning have the form of 3D point clouds.The 3D scanned point clouds have data sizes that can be considered big data.They also contain measurement noise inherent in measurement data.These properties of 3D scanned point clouds make many traditional CG/visualization techniques difficult.This paper reviewed our recent achievements in developing varieties of high-quality visualizations suitable for the visual analysis of 3D scanned point clouds.We demonstrated the effectiveness of the method by applying the visualizations to various cultural heritage objects.The main visualization targets used in this paper are the floats in the Gion Festival in Kyoto(the float parade is on the UNESCO Intangible Cultural Heritage List) and Borobudur Temple in Indonesia(a UNESCO World Heritage Site).
文摘The transmission of video content over a network raises various issues relating to copyright authenticity,ethics,legality,and privacy.The protection of copyrighted video content is a significant issue in the video industry,and it is essential to find effective solutions to prevent tampering and modification of digital video content during its transmission through digital media.However,there are stillmany unresolved challenges.This paper aims to address those challenges by proposing a new technique for detectingmoving objects in digital videos,which can help prove the credibility of video content by detecting any fake objects inserted by hackers.The proposed technique involves using two methods,the H.264 and the extraction color features methods,to embed and extract watermarks in video frames.The study tested the performance of the system against various attacks and found it to be robust.The evaluation was done using different metrics such as Peak-Signal-to-Noise Ratio(PSNR),Mean Squared Error(MSE),Structural Similarity Index Measure(SSIM),Bit Correction Ratio(BCR),and Normalized Correlation.The accuracy of identifying moving objects was high,ranging from 96.3%to 98.7%.The system was also able to embed a fragile watermark with a success rate of over 93.65%and had an average capacity of hiding of 78.67.The reconstructed video frames had high quality with a PSNR of at least 65.45 dB and SSIMof over 0.97,making them imperceptible to the human eye.The system also had an acceptable average time difference(T=1.227/s)compared with other state-of-the-art methods.
基金a grant from the Basic Science Research Program through the National Research Foundation(NRF)(2021R1F1A1063634)funded by the Ministry of Science and ICT(MSIT)Republic of Korea.This research is supported and funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R410)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Group Funding program Grant Code(NU/RG/SERC/12/6).
文摘Advances in machine vision systems have revolutionized applications such as autonomous driving,robotic navigation,and augmented reality.Despite substantial progress,challenges persist,including dynamic backgrounds,occlusion,and limited labeled data.To address these challenges,we introduce a comprehensive methodology toenhance image classification and object detection accuracy.The proposed approach involves the integration ofmultiple methods in a complementary way.The process commences with the application of Gaussian filters tomitigate the impact of noise interference.These images are then processed for segmentation using Fuzzy C-Meanssegmentation in parallel with saliency mapping techniques to find the most prominent regions.The Binary RobustIndependent Elementary Features(BRIEF)characteristics are then extracted fromdata derived fromsaliency mapsand segmented images.For precise object separation,Oriented FAST and Rotated BRIEF(ORB)algorithms areemployed.Genetic Algorithms(GAs)are used to optimize Random Forest classifier parameters which lead toimproved performance.Our method stands out due to its comprehensive approach,adeptly addressing challengessuch as changing backdrops,occlusion,and limited labeled data concurrently.A significant enhancement hasbeen achieved by integrating Genetic Algorithms(GAs)to precisely optimize parameters.This minor adjustmentnot only boosts the uniqueness of our system but also amplifies its overall efficacy.The proposed methodologyhas demonstrated notable classification accuracies of 90.9%and 89.0%on the challenging Corel-1k and MSRCdatasets,respectively.Furthermore,detection accuracies of 87.2%and 86.6%have been attained.Although ourmethod performed well in both datasets it may face difficulties in real-world data especially where datasets havehighly complex backgrounds.Despite these limitations,GAintegration for parameter optimization shows a notablestrength in enhancing the overall adaptability and performance of our system.
文摘Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects and the high cost associated with data collection.Consequently,devising algorithms capable of accurately localizing specific objects within a scene in scenarios where annotated data is limited remains a formidable challenge.To solve this problem,this paper proposes an object discovery by request problem setting and a corresponding algorithmic framework.The proposed problem setting aims to identify specified objects in scenes,and the associated algorithmic framework comprises pseudo data generation and object discovery by request network.Pseudo-data generation generates images resembling natural scenes through various data augmentation rules,using a small number of object samples and scene images.The network structure of object discovery by request utilizes the pre-trained Vision Transformer(ViT)model as the backbone,employs object-centric methods to learn the latent representations of foreground objects,and applies patch-level reconstruction constraints to the model.During the validation phase,we use the generated pseudo datasets as training sets and evaluate the performance of our model on the original test sets.Experiments have proved that our method achieves state-of-the-art performance on Unmanned Aerial Vehicles-Bottle Detection(UAV-BD)dataset and self-constructed dataset Bottle,especially in multi-object scenarios.
文摘Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.
文摘The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection in the field of blasting.Serverless Computing can provide a variety of computing services for people without hardware foundations and rich software development experience,which has aroused people’s interest in how to use it in the field ofmachine learning.In this paper,we design a distributedmachine learning training application based on the AWS Lambda platform.Based on data parallelism,the data aggregation and training synchronization in Function as a Service(FaaS)are effectively realized.It also encrypts the data set,effectively reducing the risk of data leakage.We rent a cloud server and a Lambda,and then we conduct experiments to evaluate our applications.Our results indicate the effectiveness,rapidity,and economy of distributed training on FaaS.
基金supported by the National Natural Science Foundation of China under Grant 62177029the Postgraduate Research&Practice Innovation Program of Jiangsu Province(KYCX21_0740),China.
文摘Visual object tracking plays a crucial role in computer vision.In recent years,researchers have proposed various methods to achieve high-performance object tracking.Among these,methods based on Transformers have become a research hotspot due to their ability to globally model and contextualize information.However,current Transformer-based object tracking methods still face challenges such as low tracking accuracy and the presence of redundant feature information.In this paper,we introduce self-calibration multi-head self-attention Transformer(SMSTracker)as a solution to these challenges.It employs a hybrid tensor decomposition self-organizing multihead self-attention transformermechanism,which not only compresses and accelerates Transformer operations but also significantly reduces redundant data,thereby enhancing the accuracy and efficiency of tracking.Additionally,we introduce a self-calibration attention fusion block to resolve common issues of attention ambiguities and inconsistencies found in traditional trackingmethods,ensuring the stability and reliability of tracking performance across various scenarios.By integrating a hybrid tensor decomposition approach with a self-organizingmulti-head self-attentive transformer mechanism,SMSTracker enhances the efficiency and accuracy of the tracking process.Experimental results show that SMSTracker achieves competitive performance in visual object tracking,promising more robust and efficient tracking systems,demonstrating its potential to providemore robust and efficient tracking solutions in real-world applications.
基金This work was partially supported by the National Natural Science Foundation of China(Grant Nos.61906168,U20A20171)Zhejiang Provincial Natural Science Foundation of China(Grant Nos.LY23F020023,LY21F020027)Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects(Grant Nos.2022SDSJ01).
文摘In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,diagnosis and evaluation of kidney and urinary tract disease,providing insight into the specific type and severity.However,manual urine sediment examination is labor-intensive,time-consuming,and subjective.Traditional machine learning based object detection methods require hand-crafted features for localization and classification,which have poor generalization capabilities and are difficult to quickly and accurately detect the number of urine sediments.Deep learning based object detection methods have the potential to address the challenges mentioned above,but these methods require access to large urine sediment image datasets.Unfortunately,only a limited number of publicly available urine sediment datasets are currently available.To alleviate the lack of urine sediment datasets in medical image analysis,we propose a new dataset named UriSed2K,which contains 2465 high-quality images annotated with expert guidance.Two main challenges are associated with our dataset:a large number of small objects and the occlusion between these small objects.Our manuscript focuses on applying deep learning object detection methods to the urine sediment dataset and addressing the challenges presented by this dataset.Specifically,our goal is to improve the accuracy and efficiency of the detection algorithm and,in doing so,provide medical professionals with an automatic detector that saves time and effort.We propose an improved lightweight one-stage object detection algorithm called Discriminatory-YOLO.The proposed algorithm comprises a local context attention module and a global background suppression module,which aid the detector in distinguishing urine sediment features in the image.The local context attention module captures context information beyond the object region,while the global background suppression module emphasizes objects in uninformative backgrounds.We comprehensively evaluate our method on the UriSed2K dataset,which includes seven categories of urine sediments,such as erythrocytes(red blood cells),leukocytes(white blood cells),epithelial cells,crystals,mycetes,broken erythrocytes,and broken leukocytes,achieving the best average precision(AP)of 95.3%while taking only 10 ms per image.The source code and dataset are available at https://github.com/binghuiwu98/discriminatoryyolov5.
文摘Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of labelling.However,there is a large performance gap between weakly supervised and fully supervised salient object detectors because the scribble annotation can only provide very limited foreground/background information.Therefore,an intuitive idea is to infer annotations that cover more complete object and background regions for training.To this end,a label inference strategy is proposed based on the assumption that pixels with similar colours and close positions should have consistent labels.Specifically,k-means clustering algorithm was first performed on both colours and coordinates of original annotations,and then assigned the same labels to points having similar colours with colour cluster centres and near coordinate cluster centres.Next,the same annotations for pixels with similar colours within each kernel neighbourhood was set further.Extensive experiments on six benchmarks demonstrate that our method can significantly improve the performance and achieve the state-of-the-art results.
基金supported by theKorea Industrial Technology Association(KOITA)Grant Funded by the Korean government(MSIT)(No.KOITA-2023-3-003)supported by the MSIT(Ministry of Science and ICT),Korea,under the ITRC(Information Technology Research Center)Support Program(IITP-2024-2020-0-01808)Supervised by the IITP(Institute of Information&Communications Technology Planning&Evaluation)。
文摘The advancement of navigation systems for the visually impaired has significantly enhanced their mobility by mitigating the risk of encountering obstacles and guiding them along safe,navigable routes.Traditional approaches primarily focus on broad applications such as wayfinding,obstacle detection,and fall prevention.However,there is a notable discrepancy in applying these technologies to more specific scenarios,like identifying distinct food crop types or recognizing faces.This study proposes a real-time application designed for visually impaired individuals,aiming to bridge this research-application gap.It introduces a system capable of detecting 20 different food crop types and recognizing faces with impressive accuracies of 83.27%and 95.64%,respectively.These results represent a significant contribution to the field of assistive technologies,providing visually impaired users with detailed and relevant information about their surroundings,thereby enhancing their mobility and ensuring their safety.Additionally,it addresses the vital aspects of social engagements,acknowledging the challenges faced by visually impaired individuals in recognizing acquaintances without auditory or tactile signals,and highlights recent developments in prototype systems aimed at assisting with face recognition tasks.This comprehensive approach not only promises enhanced navigational aids but also aims to enrich the social well-being and safety of visually impaired communities.
基金the Scientific Research Fund of Hunan Provincial Education Department(23A0423).
文摘Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false detection rates when applying object recognition algorithms tailored for remote sensing imagery.Additionally,these complexities contribute to inaccuracies in target localization and hinder precise target categorization.This paper addresses these challenges by proposing a solution:The YOLO-MFD model(YOLO-MFD:Remote Sensing Image Object Detection withMulti-scale Fusion Dynamic Head).Before presenting our method,we delve into the prevalent issues faced in remote sensing imagery analysis.Specifically,we emphasize the struggles of existing object recognition algorithms in comprehensively capturing critical image features amidst varying scales and complex backgrounds.To resolve these issues,we introduce a novel approach.First,we propose the implementation of a lightweight multi-scale module called CEF.This module significantly improves the model’s ability to comprehensively capture important image features by merging multi-scale feature information.It effectively addresses the issues of missed detection and mistaken alarms that are common in remote sensing imagery.Second,an additional layer of small target detection heads is added,and a residual link is established with the higher-level feature extraction module in the backbone section.This allows the model to incorporate shallower information,significantly improving the accuracy of target localization in remotely sensed images.Finally,a dynamic head attentionmechanism is introduced.This allows themodel to exhibit greater flexibility and accuracy in recognizing shapes and targets of different sizes.Consequently,the precision of object detection is significantly improved.The trial results show that the YOLO-MFD model shows improvements of 6.3%,3.5%,and 2.5%over the original YOLOv8 model in Precision,map@0.5 and map@0.5:0.95,separately.These results illustrate the clear advantages of the method.
基金the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+2 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211).
文摘Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.