基于Contextual Transformer的自动驾驶单目3D目标检测

Monocular 3D Object Detection for Autonomous Driving Based on Contextual Transformer

下载PDF

导出

摘要针对当前单目3D目标检测中存在的漏检和多尺度目标检测效果不佳的问题,提出了一种基于Contextual Transformer的自动驾驶单目3D目标检测算法(CM-RTM3D)。在ResNet-50网络中引入Contextual Transformer(CoT),构建ResNet-Transformer架构以提取特征。设计多尺度空间感知模块(MSP),通过尺度空间响应操作改善浅层特征的丢失情况,嵌入沿水平和竖直两个空间方向的坐标注意力机制(CA),使用softmax函数生成各尺度的重要性软权重。在偏移损失中采用Huber损失函数代替L1损失函数。实验结果表明:在KITTI自动驾驶数据集上,相较于RTM3D算法,该算法在简单、中等、困难三个难度级别下,AP3D分别提升了4.84、3.82、5.36个百分点,APBEV分别提升了4.75、6.26、3.56个百分点。 Aiming at the current problems of leakage and poor multi-scale target detection in monocular 3D object detection,a monocular 3D object detection algorithm for autonomous driving based on Contextual Transformer(CM-RTM3D)is proposed.Firstly,Contextual Transformer(CoT)is introduced into the ResNet-50 network to construct the ResNetTransformer architecture for feature extraction.Secondly,the multi-scale spatial perception(MSP)module is designed to improve the loss of shallow features through scale-space response operations,embedding the coordinate attention mechanism(CA)along both horizontal and vertical spatial directions,and generating soft weights of importance at each scale using the softmax function.Finally,the Huber loss function is used instead of the L1 loss function in the offset loss.The experimental results show that,compared with the RTM3D algorithm on the KITTI autopilot dataset,the algorithm in this paper improves AP3D by 4.84,3.82,and 5.36 percentage points,and APBEV by 4.75,6.26,and 3.56 percentage points,respectively,at the three difficulty levels of easy,medium,and difficult.

作者厍向阳颜唯佳董立红 SHE Xiangyang;YAN Weijia;DONG Lihong(College of Computer Science and Technology,Xi’an University of Science and Technology,Xi’an 710054,China)

机构地区西安科技大学计算机科学与技术学院

出处《计算机工程与应用》 CSCD 北大核心 2024年第19期178-189,共12页 Computer Engineering and Applications

基金陕西省自然科学基础研究项目(2019JLM-11) 陕西省科技计划(2021JQ-576) 陕西省教育厅项目(19JK0526)。

关键词自动驾驶单目3D目标检测 Contextual Transformer 多尺度感知坐标注意力机制 autonomous driving monocular 3D object detection Contextual Transformer multi-scale perception coordinate attention mechanism

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1房运涛,李爽,韩晓琴,翟强,庄顺胥,齐伟,宋丽娟.基于YOLOv8的发动机缸内异物检测算法开发与应用[J].内燃机与动力装置,2024,41(4):33-40.
2朱立忠,佟昕.基于深度学习的图像双分支修复算法[J].通信与信息技术,2024(5):14-18.
3何存富,王永慷,高杰,张义政,吕炎.表面波相控变频电磁声传感器的研制[J].仪器仪表学报,2024,45(5):90-98.
4Jie Guo,Mengying Wang,Wenwei Wang,Yan Zhou,Bin Song.DI-VTR:Dual inter-modal interaction model for video-text retrieval[J].Journal of Information and Intelligence,2024,2(5):388-403.
5加雨欣.比亚迪英文宣传片的多模态翻译及品牌形象构建研究[J].现代语言学,2024,12(9):51-57.
6王洁.“拉倒”隐性否定义的浮现机制[J].现代语言学,2024,12(9):188-195.
7Licheng Sun,Heping Li,Liang Wang.HWD-YOLO:A New Vision-Based Helmet Wearing Detection Method[J].Computers, Materials & Continua,2024,80(9):4543-4560.
8李海青.基于词频的古汉语中的智慧内涵分析[J].现代语言学,2024,12(9):658-665.

计算机工程与应用

2024年第19期

浏览历史

内容加载中请稍等...

基于Contextual Transformer的自动驾驶单目3D目标检测

相关作者

相关机构

相关主题

浏览历史