为了解决在全球导航卫星系统(Global Navigation Satellite System)拒止情况下无人机导航能力缺失等问题,提出了一种基于改进快速提取旋转描述子(Oriented FAST and Rotated Brief,ORB)图像特征匹配的无人机视觉导航方法。首先,为了实...为了解决在全球导航卫星系统(Global Navigation Satellite System)拒止情况下无人机导航能力缺失等问题,提出了一种基于改进快速提取旋转描述子(Oriented FAST and Rotated Brief,ORB)图像特征匹配的无人机视觉导航方法。首先,为了实现无人机的绝对定位,提出了一种特征图像基准数据库构建方法;其次,为提取图像数据集的特征点,采用了一种结合尺度不变特征变换(Scale Invariant Feature Transform,SIFT)的尺度空间优化ORB特征提取算法;最后,为了将图像特征与图像基准数据库快速匹配并提高其匹配精度,提出了一种改进ORB特征匹配算法——ORB+GMS+PROSAC算法。通过在ArcGIS中分割图像构建基准数据库并进行实验分析,结果表明,基于ORB+GMS+PROSAC特征匹配算法性能显著提升,其中匹配准确率上升5.05%,匹配时间减少41.61%,明显优于其他传统特征匹配算法。展开更多
针对传统图像识别算法匹配正确率低、运行时间较长等问题,文中提出了基于改进ORB-FLANN(Oriented FAST and Rotated BRIEF-Fast Library for Approximate Nearest Neighbors)的工件图像识别方法。对ORB算法特征描述、图像特征匹配算法...针对传统图像识别算法匹配正确率低、运行时间较长等问题,文中提出了基于改进ORB-FLANN(Oriented FAST and Rotated BRIEF-Fast Library for Approximate Nearest Neighbors)的工件图像识别方法。对ORB算法特征描述、图像特征匹配算法进行修改,解决传统图像识别算法在图像存在尺度和旋转变换情况下存在的弊端并降低误匹配率。该方法对ORB算法检测到的特征点采用SURF(Speeded Up Robust Features)算法添加方向信息并完成特征描述,得到旋转尺度不变性的特征点,结合FLANN算法并引入双向匹配策略进行特征点粗匹配,最后利用渐进采样一致算法进一步剔除误匹配点对完成精匹配。实验结果表明,与其他方法相比,改进算法在处理尺度、旋转等变换图像时,匹配正确率分别提高了2.6%~18.8%和29.5%~43.9%,运行时长均在4 s以内,提高了对工件图像的识别效率和精准性。展开更多
针对传统ORB(Oriented FAST and Rotated BRIEF)算法提取图像特征时存在的特征点数量不足且分布不均匀问题,提出了一种基于四叉树的ORB特征阶梯分布算法。通过四叉树算法分割出特征点疏密不同的区域,对每个区域采用逐步降低阈值的方法,...针对传统ORB(Oriented FAST and Rotated BRIEF)算法提取图像特征时存在的特征点数量不足且分布不均匀问题,提出了一种基于四叉树的ORB特征阶梯分布算法。通过四叉树算法分割出特征点疏密不同的区域,对每个区域采用逐步降低阈值的方法,实现FAST(Features from Accelerated Segment Test)角点自适应提取;同时依据分割区域设置逐次递减的分割深度和特征点提取比例,以减少运算时间和特征冗余,使特征点分布更均匀。采用覆盖均匀度对特征点的均匀性进行量化。试验结果表明,该算法比传统ORB算法单幅图片的特征点提取数量平均多10.45%,覆盖均匀度平均低20%,运行时间比Mur-Artal算法平均减少20.54%,有效地提高了提取特征点的数量和均匀性,提升了运算效率。展开更多
针对传统特征匹配算法计算效率低、误匹配率高和双目视觉测量精度不足等问题,提出了一种基于自适应几何约束和随机抽样一致性方法的ORB(Oriented FAST and Rotated BRIEF)红外双目测距方法。首先,通过FAST(Features from Accelerated Se...针对传统特征匹配算法计算效率低、误匹配率高和双目视觉测量精度不足等问题,提出了一种基于自适应几何约束和随机抽样一致性方法的ORB(Oriented FAST and Rotated BRIEF)红外双目测距方法。首先,通过FAST(Features from Accelerated Segment Test)算法与BRIEF(Binary Robust Independent Elementary Features)算法检测并描述关键点,采用快速最近邻搜索的算法完成特征点初始匹配。然后,根据初始匹配点对的斜率与距离选择相应的阈值,构建基于斜率与距离的几何约束,剔除明显错误匹配点对。最后利用随机抽样一致性方法去除异常点完成精匹配,结合热像仪标定参数计算出目标物体的距离。实验结果表明,改进的ORB算法与传统算法相比,具有较好的特征点质量和较高的测量精度,测距平均绝对误差为1.64%,具有较好的实用价值。展开更多
Advances in machine vision systems have revolutionized applications such as autonomous driving,robotic navigation,and augmented reality.Despite substantial progress,challenges persist,including dynamic backgrounds,occ...Advances in machine vision systems have revolutionized applications such as autonomous driving,robotic navigation,and augmented reality.Despite substantial progress,challenges persist,including dynamic backgrounds,occlusion,and limited labeled data.To address these challenges,we introduce a comprehensive methodology toenhance image classification and object detection accuracy.The proposed approach involves the integration ofmultiple methods in a complementary way.The process commences with the application of Gaussian filters tomitigate the impact of noise interference.These images are then processed for segmentation using Fuzzy C-Meanssegmentation in parallel with saliency mapping techniques to find the most prominent regions.The Binary RobustIndependent Elementary Features(BRIEF)characteristics are then extracted fromdata derived fromsaliency mapsand segmented images.For precise object separation,Oriented FAST and Rotated BRIEF(ORB)algorithms areemployed.Genetic Algorithms(GAs)are used to optimize Random Forest classifier parameters which lead toimproved performance.Our method stands out due to its comprehensive approach,adeptly addressing challengessuch as changing backdrops,occlusion,and limited labeled data concurrently.A significant enhancement hasbeen achieved by integrating Genetic Algorithms(GAs)to precisely optimize parameters.This minor adjustmentnot only boosts the uniqueness of our system but also amplifies its overall efficacy.The proposed methodologyhas demonstrated notable classification accuracies of 90.9%and 89.0%on the challenging Corel-1k and MSRCdatasets,respectively.Furthermore,detection accuracies of 87.2%and 86.6%have been attained.Although ourmethod performed well in both datasets it may face difficulties in real-world data especially where datasets havehighly complex backgrounds.Despite these limitations,GAintegration for parameter optimization shows a notablestrength in enhancing the overall adaptability and performance of our system.展开更多
Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects an...Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects and the high cost associated with data collection.Consequently,devising algorithms capable of accurately localizing specific objects within a scene in scenarios where annotated data is limited remains a formidable challenge.To solve this problem,this paper proposes an object discovery by request problem setting and a corresponding algorithmic framework.The proposed problem setting aims to identify specified objects in scenes,and the associated algorithmic framework comprises pseudo data generation and object discovery by request network.Pseudo-data generation generates images resembling natural scenes through various data augmentation rules,using a small number of object samples and scene images.The network structure of object discovery by request utilizes the pre-trained Vision Transformer(ViT)model as the backbone,employs object-centric methods to learn the latent representations of foreground objects,and applies patch-level reconstruction constraints to the model.During the validation phase,we use the generated pseudo datasets as training sets and evaluate the performance of our model on the original test sets.Experiments have proved that our method achieves state-of-the-art performance on Unmanned Aerial Vehicles-Bottle Detection(UAV-BD)dataset and self-constructed dataset Bottle,especially in multi-object scenarios.展开更多
Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain les...Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.展开更多
The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection ...The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection in the field of blasting.Serverless Computing can provide a variety of computing services for people without hardware foundations and rich software development experience,which has aroused people’s interest in how to use it in the field ofmachine learning.In this paper,we design a distributedmachine learning training application based on the AWS Lambda platform.Based on data parallelism,the data aggregation and training synchronization in Function as a Service(FaaS)are effectively realized.It also encrypts the data set,effectively reducing the risk of data leakage.We rent a cloud server and a Lambda,and then we conduct experiments to evaluate our applications.Our results indicate the effectiveness,rapidity,and economy of distributed training on FaaS.展开更多
In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,...In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,diagnosis and evaluation of kidney and urinary tract disease,providing insight into the specific type and severity.However,manual urine sediment examination is labor-intensive,time-consuming,and subjective.Traditional machine learning based object detection methods require hand-crafted features for localization and classification,which have poor generalization capabilities and are difficult to quickly and accurately detect the number of urine sediments.Deep learning based object detection methods have the potential to address the challenges mentioned above,but these methods require access to large urine sediment image datasets.Unfortunately,only a limited number of publicly available urine sediment datasets are currently available.To alleviate the lack of urine sediment datasets in medical image analysis,we propose a new dataset named UriSed2K,which contains 2465 high-quality images annotated with expert guidance.Two main challenges are associated with our dataset:a large number of small objects and the occlusion between these small objects.Our manuscript focuses on applying deep learning object detection methods to the urine sediment dataset and addressing the challenges presented by this dataset.Specifically,our goal is to improve the accuracy and efficiency of the detection algorithm and,in doing so,provide medical professionals with an automatic detector that saves time and effort.We propose an improved lightweight one-stage object detection algorithm called Discriminatory-YOLO.The proposed algorithm comprises a local context attention module and a global background suppression module,which aid the detector in distinguishing urine sediment features in the image.The local context attention module captures context information beyond the object region,while the global background suppression module emphasizes objects in uninformative backgrounds.We comprehensively evaluate our method on the UriSed2K dataset,which includes seven categories of urine sediments,such as erythrocytes(red blood cells),leukocytes(white blood cells),epithelial cells,crystals,mycetes,broken erythrocytes,and broken leukocytes,achieving the best average precision(AP)of 95.3%while taking only 10 ms per image.The source code and dataset are available at https://github.com/binghuiwu98/discriminatoryyolov5.展开更多
Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unman...Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.展开更多
文摘针对传统图像识别算法匹配正确率低、运行时间较长等问题,文中提出了基于改进ORB-FLANN(Oriented FAST and Rotated BRIEF-Fast Library for Approximate Nearest Neighbors)的工件图像识别方法。对ORB算法特征描述、图像特征匹配算法进行修改,解决传统图像识别算法在图像存在尺度和旋转变换情况下存在的弊端并降低误匹配率。该方法对ORB算法检测到的特征点采用SURF(Speeded Up Robust Features)算法添加方向信息并完成特征描述,得到旋转尺度不变性的特征点,结合FLANN算法并引入双向匹配策略进行特征点粗匹配,最后利用渐进采样一致算法进一步剔除误匹配点对完成精匹配。实验结果表明,与其他方法相比,改进算法在处理尺度、旋转等变换图像时,匹配正确率分别提高了2.6%~18.8%和29.5%~43.9%,运行时长均在4 s以内,提高了对工件图像的识别效率和精准性。
文摘针对传统ORB(Oriented FAST and Rotated BRIEF)算法提取图像特征时存在的特征点数量不足且分布不均匀问题,提出了一种基于四叉树的ORB特征阶梯分布算法。通过四叉树算法分割出特征点疏密不同的区域,对每个区域采用逐步降低阈值的方法,实现FAST(Features from Accelerated Segment Test)角点自适应提取;同时依据分割区域设置逐次递减的分割深度和特征点提取比例,以减少运算时间和特征冗余,使特征点分布更均匀。采用覆盖均匀度对特征点的均匀性进行量化。试验结果表明,该算法比传统ORB算法单幅图片的特征点提取数量平均多10.45%,覆盖均匀度平均低20%,运行时间比Mur-Artal算法平均减少20.54%,有效地提高了提取特征点的数量和均匀性,提升了运算效率。
文摘异源图像配准中,由于图像的成像机理差异,图像像素强度关联和旋转畸变是不可避免的两大问题,针对图像像素强度关联问题,提出了基于辐射不变特征变换(radiation-variation insensitive feature transform,RIFT)的图像配准算法,对图像间像素关联差异小的图像对配准有良好的精度,但对旋转畸变图像会产生较多错误匹配。对于旋转畸变问题,传统的ORB(oriented fast and rotated brief)算法,对旋转图像的配准有一定的稳定性,但对于强度变化不明显的图像对,特征点检测质量较低,配准精度不理想。因此本文将相位一致性(phase consistency,PC)融合进ORB算法,利用相位信息代替传统的图像强度信息,再构造旋转不变性BRIEF特征描述子,对图像像素强度变化和旋转畸变均具有鲁棒性。用图像像素强度关联不明显的红外图像和可见光图像进行配准实验,本文算法针对不同旋转幅度的图像的配准精度较高,RMSE稳定在1.7~2.1,优于RIFT算法,在特征点检测数量、配准精度和效率等性能上均有良好性能。
文摘针对传统特征匹配算法计算效率低、误匹配率高和双目视觉测量精度不足等问题,提出了一种基于自适应几何约束和随机抽样一致性方法的ORB(Oriented FAST and Rotated BRIEF)红外双目测距方法。首先,通过FAST(Features from Accelerated Segment Test)算法与BRIEF(Binary Robust Independent Elementary Features)算法检测并描述关键点,采用快速最近邻搜索的算法完成特征点初始匹配。然后,根据初始匹配点对的斜率与距离选择相应的阈值,构建基于斜率与距离的几何约束,剔除明显错误匹配点对。最后利用随机抽样一致性方法去除异常点完成精匹配,结合热像仪标定参数计算出目标物体的距离。实验结果表明,改进的ORB算法与传统算法相比,具有较好的特征点质量和较高的测量精度,测距平均绝对误差为1.64%,具有较好的实用价值。
基金a grant from the Basic Science Research Program through the National Research Foundation(NRF)(2021R1F1A1063634)funded by the Ministry of Science and ICT(MSIT)Republic of Korea.This research is supported and funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R410)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Group Funding program Grant Code(NU/RG/SERC/12/6).
文摘Advances in machine vision systems have revolutionized applications such as autonomous driving,robotic navigation,and augmented reality.Despite substantial progress,challenges persist,including dynamic backgrounds,occlusion,and limited labeled data.To address these challenges,we introduce a comprehensive methodology toenhance image classification and object detection accuracy.The proposed approach involves the integration ofmultiple methods in a complementary way.The process commences with the application of Gaussian filters tomitigate the impact of noise interference.These images are then processed for segmentation using Fuzzy C-Meanssegmentation in parallel with saliency mapping techniques to find the most prominent regions.The Binary RobustIndependent Elementary Features(BRIEF)characteristics are then extracted fromdata derived fromsaliency mapsand segmented images.For precise object separation,Oriented FAST and Rotated BRIEF(ORB)algorithms areemployed.Genetic Algorithms(GAs)are used to optimize Random Forest classifier parameters which lead toimproved performance.Our method stands out due to its comprehensive approach,adeptly addressing challengessuch as changing backdrops,occlusion,and limited labeled data concurrently.A significant enhancement hasbeen achieved by integrating Genetic Algorithms(GAs)to precisely optimize parameters.This minor adjustmentnot only boosts the uniqueness of our system but also amplifies its overall efficacy.The proposed methodologyhas demonstrated notable classification accuracies of 90.9%and 89.0%on the challenging Corel-1k and MSRCdatasets,respectively.Furthermore,detection accuracies of 87.2%and 86.6%have been attained.Although ourmethod performed well in both datasets it may face difficulties in real-world data especially where datasets havehighly complex backgrounds.Despite these limitations,GAintegration for parameter optimization shows a notablestrength in enhancing the overall adaptability and performance of our system.
文摘Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects and the high cost associated with data collection.Consequently,devising algorithms capable of accurately localizing specific objects within a scene in scenarios where annotated data is limited remains a formidable challenge.To solve this problem,this paper proposes an object discovery by request problem setting and a corresponding algorithmic framework.The proposed problem setting aims to identify specified objects in scenes,and the associated algorithmic framework comprises pseudo data generation and object discovery by request network.Pseudo-data generation generates images resembling natural scenes through various data augmentation rules,using a small number of object samples and scene images.The network structure of object discovery by request utilizes the pre-trained Vision Transformer(ViT)model as the backbone,employs object-centric methods to learn the latent representations of foreground objects,and applies patch-level reconstruction constraints to the model.During the validation phase,we use the generated pseudo datasets as training sets and evaluate the performance of our model on the original test sets.Experiments have proved that our method achieves state-of-the-art performance on Unmanned Aerial Vehicles-Bottle Detection(UAV-BD)dataset and self-constructed dataset Bottle,especially in multi-object scenarios.
文摘Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.
文摘The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection in the field of blasting.Serverless Computing can provide a variety of computing services for people without hardware foundations and rich software development experience,which has aroused people’s interest in how to use it in the field ofmachine learning.In this paper,we design a distributedmachine learning training application based on the AWS Lambda platform.Based on data parallelism,the data aggregation and training synchronization in Function as a Service(FaaS)are effectively realized.It also encrypts the data set,effectively reducing the risk of data leakage.We rent a cloud server and a Lambda,and then we conduct experiments to evaluate our applications.Our results indicate the effectiveness,rapidity,and economy of distributed training on FaaS.
基金This work was partially supported by the National Natural Science Foundation of China(Grant Nos.61906168,U20A20171)Zhejiang Provincial Natural Science Foundation of China(Grant Nos.LY23F020023,LY21F020027)Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects(Grant Nos.2022SDSJ01).
文摘In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,diagnosis and evaluation of kidney and urinary tract disease,providing insight into the specific type and severity.However,manual urine sediment examination is labor-intensive,time-consuming,and subjective.Traditional machine learning based object detection methods require hand-crafted features for localization and classification,which have poor generalization capabilities and are difficult to quickly and accurately detect the number of urine sediments.Deep learning based object detection methods have the potential to address the challenges mentioned above,but these methods require access to large urine sediment image datasets.Unfortunately,only a limited number of publicly available urine sediment datasets are currently available.To alleviate the lack of urine sediment datasets in medical image analysis,we propose a new dataset named UriSed2K,which contains 2465 high-quality images annotated with expert guidance.Two main challenges are associated with our dataset:a large number of small objects and the occlusion between these small objects.Our manuscript focuses on applying deep learning object detection methods to the urine sediment dataset and addressing the challenges presented by this dataset.Specifically,our goal is to improve the accuracy and efficiency of the detection algorithm and,in doing so,provide medical professionals with an automatic detector that saves time and effort.We propose an improved lightweight one-stage object detection algorithm called Discriminatory-YOLO.The proposed algorithm comprises a local context attention module and a global background suppression module,which aid the detector in distinguishing urine sediment features in the image.The local context attention module captures context information beyond the object region,while the global background suppression module emphasizes objects in uninformative backgrounds.We comprehensively evaluate our method on the UriSed2K dataset,which includes seven categories of urine sediments,such as erythrocytes(red blood cells),leukocytes(white blood cells),epithelial cells,crystals,mycetes,broken erythrocytes,and broken leukocytes,achieving the best average precision(AP)of 95.3%while taking only 10 ms per image.The source code and dataset are available at https://github.com/binghuiwu98/discriminatoryyolov5.
基金This research was funded by the Natural Science Foundation of Hebei Province(F2021506004).
文摘Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.