多关键点约束与深度估计辅助的单目3D目标检测算法

A Monocular 3D Object Detection Algorithm with Multi-Keypoint Constraints and Depth Estimation Assistance

下载PDF

导出

摘要当前主流的单目相机3D目标检测网络采用关键点检测范式,存在关键点预测与深度估计不准确的问题,限制了单目3D检测器的性能表现.本文提出一种多关键点约束与深度估计辅助的单目3D目标检测算法Mono-Aux,利用3D检测框的角点投影点、上表面与下表面中心投影点作为3D框中心投影点的补充,通过多关键点约束提升关键点预测精度;提出一种LiDAR-Free解耦深度估计方法,在不引入激光点云数据的同时通过几何关系推导引入额外的深度估计辅助监督信号,提升深度估计的准确性.多关键点约束与深度估计辅助仅在训练阶段使用,推理阶段不引入额外的计算成本.在KITTI3D目标检测验证集和测试集上的结果显示,相较于MonoDLE基线网络,提出的MonoAux算法在目标检测精度上分别提高3.87%和4.64%,与其他SOTA方法相比,本文方法也具有显著的性能优势,甚至优于部分使用额外数据的方法. The mainstream monocular 3D object detection algorithms typically rely on a keypointbased paradigm.While widely adopted,these approaches often face challenges in accurately predicting keypoints and estimating depth,which ultimately limit the performance of monocular 3D detectors.The core problem lies in the inherent difficulty of generating precise keypoints and depth values from a single 2D image.This paper introduces a novel solution to these issues,which is a monocular 3D detector named MonoAux that incorporates multi-keypoint constraints and depth estimation assistance.Traditional monocular 3D detection algorithms generally use the center projection point of the 3D bounding box as the primary keypoint for detection and localization tasks.However,relying solely on this center point often leads to suboptimal results,as it doesn’t fully capture the spatial characteristics of the object.To improve the precision of keypoint prediction,MonoAux introduces multiple keypoints into the process.Specifically,it uses the corner points of the 3D bounding box and the center points of both the upper and lower surfaces of the bounding box.These additional keypoints serve as supplementary constraints to improve the prediction of keypoint prediction,and thus enhance the algorithm’s ability to accurately estimate the object’s orientation and shape in 3D space.By improving the prediction of these keypoints,MonoAux is able to generate more accurate 3D bounding boxes,which in turn improves the object detection performance.In addition to the multi-keypoint constraints,MonoAux introduces a novel approach to depth estimation that operates entirely without the use of LiDAR data.Many state-of-the-art(SOTA)3D object detection methods rely on LiDAR point clouds to obtain accurate depth information,but this can be computationally expensive and requires specialized hardware.MonoAux tackles this challenge by proposing a LiDAR-free decoupling depth estimation method,which enhances the accuracy of depth estimation using only the geometric relationships inherent in the scene.This approach provides auxiliary supervision signals to improve the accuracy of depth prediction,even without the need for LiDAR data.As a result,the algorithm can estimate depth more accurately while maintaining efficiency and eliminating the need for expensive sensors.One of the key strengths of MonoAux is that the additional multi-keypoint constraints and depth estimation assistance are only applied during the training phase.This means that during the inference phase,there is no additional computational cost.The effectiveness of MonoAux is validated through experiments conducted on the KITTI3D object detection validation set and test set.These results show a substantial improvement in performance,with MonoAux achieving a 3.87%and 4.64%increase in object detection accuracy compared to the baseline network MonoDLE.Moreover,when compared to other state-of-theart methods,MonoAux demonstrates significant performance advantages.It even outperforms some methods that rely on additional data,further proving its robustness and efficiency.In summary,MonoAux offers a significant advancement in monocular 3D object detection by addressing the core challenges of keypoint prediction and depth estimation.Its innovative use of multi-keypoint constraints and LiDAR-free depth estimation assistance not only improves accuracy but also ensures efficiency during the inference phase.The results on benchmark datasets underscore its potential to outperform existing methods,making it a promising solution for a range of applications.

作者郑锦王森李航周裕海 ZHENG Jin;WANG Sen;LI Hang;ZHOU Yu-Hai(School of Computer Science and Engineering,Beihang University,Beijing 100191;State Key Laboratory of Virtual Reality Technology and Systems,Beijing 100191)

机构地区北京航空航天大学计算机学院虚拟现实技术与系统全国重点实验室

出处《计算机学报》 EI CAS CSCD 北大核心 2024年第12期2803-2818,共16页 Chinese Journal of Computers

关键词 3D目标检测关键点预测角点投影点深度估计激光点云 3D object detection keypoint prediction corner projection point depth estimation laser point cloud

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1杜晓庆,葛潇峰,朱炯亦,汪德江,蒋海里,刘攀攀.基于深度学习的混凝土结构钢筋工程质量图像视觉检测算法[J].建筑科学与工程学报,2024,41(6):31-40.
2王铁星,魏冠军,王永鑫.基于中心轴正交旋转的点云数据隧道断面提取方法[J].地球信息科学学报,2024,26(12):2759-2771.
3吴启星,姜玉庭,刘露,郭晓霞,郭思含,张瑞香,邢丹,赵彤言,郭文峰.siRNA与dsRNA在白纹伊蚊体内的干扰效果[J].中国热带医学,2024,24(11):1411-1418.

计算机学报

2024年第12期

浏览历史

内容加载中请稍等...

多关键点约束与深度估计辅助的单目3D目标检测算法

相关作者

相关机构

相关主题

浏览历史