摘要
由于遥感图像中的目标尺寸差异大,且捕获不同尺度目标的信息非常困难,因此难以有效识别不同尺度目标。同时,传统Transformer在处理高分辨率图像时会出现计算资源不足的问题;单一的损失计算方式和匈牙利算法结合会增大代价损失的波动性,影响算法的收敛速度和精度。基于上述问题,本文提出一种基于改进DAB-DETR的多尺度遥感目标检测算法(Multi-scale dynamic anchor boxes for DETR, MSDAB-DETR)。首先,该算法通过创建一种新型的多尺度注意力融合模块,利用不同分辨率特征信息之间的差异,实现了对遥感图像的多尺度预测。其次,采用高效注意力机制对Transformer模型中的自注意力机制进行改进,降低原始模型的内存占用量。最后,利用SIoU损失函数作为边界框回归损失,与匈牙利算法相结合,削弱了二分图匹配的波动性,加快了收敛速度,并进一步改善了边界框的回归能力。实验结果表明,该方法在NWPU VHR-10和DIOR数据集上的检测精度分别高达95.3%和71.5%;在NWPU VHR-10数据集上,小、中、大3种尺度目标的平均检测精度相较于DAB-DETR模型分别提升了10.5%、1.8%和2.7%;内存占用量减少约9%。
Due to the large differences of target size in remote sensing images and the difficulty in effectively capturing the effective features of targets at different scales,it is difficult to effectively identify targets at different scales.And,when dealing with high-resolution images,traditional Transformers may face the problem of insufficient computational resources.In addition,the combination of a single loss calculation method and the Hungarian algorithm can increase the fluctuation of cost loss and affect the convergence speed and accuracy of the algorithm.Therefore,a multi-scale remote sensing target detection algorithm,named as MSDAB-DETR,is proposed.Firstly,the algorithm creates a new multi-scale attention fusion module to leverage the differences between different resolution feature information to achieve multi-scale prediction of remote sensing images.Secondly,an efficient attention mechanism is adopted to improve the self-attention mechanism in the Transformer model,reducing the memory footprint of the original model.Finally,the SIoU loss function is used as the bounding box regression loss,combined with the Hungarian algorithm,to weaken the fluctuation of binary graph matching,accelerate the convergence speed,and further improve the regression ability of bounding boxes.Experimental results show that the detection accuracy of this method on the NWPU VHR-10 and DIOR datasets is as high as 95.3%and 71.5%,respectively.Among them,on the NWPU VHR-10 dataset,the average detection accuracy for small,medium,and large-scale targets is improved by 10.5%,1.8%,and 2.7%,respectively compared to the DAB-DETR model.At the same time,the memory footprint is reduced by about 9%.
作者
李烨
周生翠
张驰
LI Ye;ZHOU Shengcui;ZHANG Chi(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《数据采集与处理》
CSCD
北大核心
2024年第6期1455-1469,共15页
Journal of Data Acquisition and Processing