摘要
无人机高空航拍的目标普遍尺寸小、特征弱,而且受复杂天候条件影响大,导致基于可见光或红外单模态图像的目标检测漏检、误检率较高。对此,提出了重参数化增强的双模态实时目标检测模型DM-YOLO。首先,采用通道拼接的方法融合可见光和红外图像,以极低的成本融合双模态图像的互补信息。其次,提出更加高效的重参数化模块并基于此构建了更加强大的骨干网RepCSPDarkNet,有效增强了骨干网对双模态图像的特征提取能力。然后,提出了多层次特征融合模块,通过多感受野卷积和注意力机制融合弱小目标的多尺度特征信息,增强了弱小目标的多尺度特征表示。最后,删除了对弱小目标检测基本不起作用的特征金字塔深层检测层,在检测精度保持不变的情况下,减小了模型规模。实验结果表明,在大规模的双模态图像数据集DroneVehicle上,DM-YOLO的检测精度比基准YOLOv5s高出2.45%,且优于规模相当的YOLOv6和YOLOv7模型,有效提高了复杂光照条件下目标检测的准确性和鲁棒性,同时检测速度达到82 FPS,可满足实时检测的需求。
The objects captured by drones at high altitudes are generally small and have weak features,and they are greatly affec-ted by complex weather conditions.Object detection based on visible or infrared images often has high rates of missed detection and false detection.To address this problem,this paper proposes a dual-modal realtime object detection model DM-YOLO with reparameterization enhancement.Firstly,the visible and infrared images are effectively fused by channel concatenation,which makes efficient use of the complementary information in the dual-modal images at a very low cost.Secondly,a more efficient reparameterization module is proposed and a more powerful backbone network RepCSPDarkNet is constructed based on it,which effectively improves the feature extraction capability of the backbone network for dual-modal images.Then,a multi-level feature fusion module is proposed to enhance the multiscale feature representation of weak and small objects by fusing multi-scale feature information of weak and small objects with multi-receptive field dilated convolution and attention mechanism.Finally,the deep feature layer of the feature pyramid is removed,which reduces the model size while maintaining the detection accuracy.Experimental results on the large-scale dual-modal image dataset DroneVehicle show that,the detection accuracy of DM-YOLO is 2.45%higher than that of the baseline YOLOv5s,and is better than that of the YOLOv6 and YOLOv7 models.Furthermore,it effectively improves the accuracy and robustness of object detection under complex weather conditions,while achieving a detection speed of 82 frames per second,which can meet the requirements of realtime detection.
作者
李允臣
张睿
王家宝
李阳
王梓祺
陈瑶
LI Yunchen;ZHANG Rui;WANG Jiabao;LI Yang;WANG Ziqi;CHEN Yao(College of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007,China)
出处
《计算机科学》
CSCD
北大核心
2024年第9期162-172,共11页
Computer Science
基金
江苏省高校自然科学研究基金(BK20200581)。
关键词
重参数化
双模态
实时目标检测
多尺度特征
注意力机制
Reparameterization
Dual modality
Real-time object detection
Multiscale features
Attention mechanism