摘要
针对当前单目3D目标检测中存在的漏检和多尺度目标检测效果不佳的问题,提出了一种基于Contextual Transformer的自动驾驶单目3D目标检测算法(CM-RTM3D)。在ResNet-50网络中引入Contextual Transformer(CoT),构建ResNet-Transformer架构以提取特征。设计多尺度空间感知模块(MSP),通过尺度空间响应操作改善浅层特征的丢失情况,嵌入沿水平和竖直两个空间方向的坐标注意力机制(CA),使用softmax函数生成各尺度的重要性软权重。在偏移损失中采用Huber损失函数代替L1损失函数。实验结果表明:在KITTI自动驾驶数据集上,相较于RTM3D算法,该算法在简单、中等、困难三个难度级别下,AP3D分别提升了4.84、3.82、5.36个百分点,APBEV分别提升了4.75、6.26、3.56个百分点。
Aiming at the current problems of leakage and poor multi-scale target detection in monocular 3D object detection,a monocular 3D object detection algorithm for autonomous driving based on Contextual Transformer(CM-RTM3D)is proposed.Firstly,Contextual Transformer(CoT)is introduced into the ResNet-50 network to construct the ResNetTransformer architecture for feature extraction.Secondly,the multi-scale spatial perception(MSP)module is designed to improve the loss of shallow features through scale-space response operations,embedding the coordinate attention mechanism(CA)along both horizontal and vertical spatial directions,and generating soft weights of importance at each scale using the softmax function.Finally,the Huber loss function is used instead of the L1 loss function in the offset loss.The experimental results show that,compared with the RTM3D algorithm on the KITTI autopilot dataset,the algorithm in this paper improves AP3D by 4.84,3.82,and 5.36 percentage points,and APBEV by 4.75,6.26,and 3.56 percentage points,respectively,at the three difficulty levels of easy,medium,and difficult.
作者
厍向阳
颜唯佳
董立红
SHE Xiangyang;YAN Weijia;DONG Lihong(College of Computer Science and Technology,Xi’an University of Science and Technology,Xi’an 710054,China)
出处
《计算机工程与应用》
CSCD
北大核心
2024年第19期178-189,共12页
Computer Engineering and Applications
基金
陕西省自然科学基础研究项目(2019JLM-11)
陕西省科技计划(2021JQ-576)
陕西省教育厅项目(19JK0526)。