摘要
当前主流的单目相机3D目标检测网络采用关键点检测范式,存在关键点预测与深度估计不准确的问题,限制了单目3D检测器的性能表现.本文提出一种多关键点约束与深度估计辅助的单目3D目标检测算法Mono-Aux,利用3D检测框的角点投影点、上表面与下表面中心投影点作为3D框中心投影点的补充,通过多关键点约束提升关键点预测精度;提出一种LiDAR-Free解耦深度估计方法,在不引入激光点云数据的同时通过几何关系推导引入额外的深度估计辅助监督信号,提升深度估计的准确性.多关键点约束与深度估计辅助仅在训练阶段使用,推理阶段不引入额外的计算成本.在KITTI3D目标检测验证集和测试集上的结果显示,相较于MonoDLE基线网络,提出的MonoAux算法在目标检测精度上分别提高3.87%和4.64%,与其他SOTA方法相比,本文方法也具有显著的性能优势,甚至优于部分使用额外数据的方法.
The mainstream monocular 3D object detection algorithms typically rely on a keypointbased paradigm.While widely adopted,these approaches often face challenges in accurately predicting keypoints and estimating depth,which ultimately limit the performance of monocular 3D detectors.The core problem lies in the inherent difficulty of generating precise keypoints and depth values from a single 2D image.This paper introduces a novel solution to these issues,which is a monocular 3D detector named MonoAux that incorporates multi-keypoint constraints and depth estimation assistance.Traditional monocular 3D detection algorithms generally use the center projection point of the 3D bounding box as the primary keypoint for detection and localization tasks.However,relying solely on this center point often leads to suboptimal results,as it doesn’t fully capture the spatial characteristics of the object.To improve the precision of keypoint prediction,MonoAux introduces multiple keypoints into the process.Specifically,it uses the corner points of the 3D bounding box and the center points of both the upper and lower surfaces of the bounding box.These additional keypoints serve as supplementary constraints to improve the prediction of keypoint prediction,and thus enhance the algorithm’s ability to accurately estimate the object’s orientation and shape in 3D space.By improving the prediction of these keypoints,MonoAux is able to generate more accurate 3D bounding boxes,which in turn improves the object detection performance.In addition to the multi-keypoint constraints,MonoAux introduces a novel approach to depth estimation that operates entirely without the use of LiDAR data.Many state-of-the-art(SOTA)3D object detection methods rely on LiDAR point clouds to obtain accurate depth information,but this can be computationally expensive and requires specialized hardware.MonoAux tackles this challenge by proposing a LiDAR-free decoupling depth estimation method,which enhances the accuracy of depth estimation using only the geometric relationships inherent in the scene.This approach provides auxiliary supervision signals to improve the accuracy of depth prediction,even without the need for LiDAR data.As a result,the algorithm can estimate depth more accurately while maintaining efficiency and eliminating the need for expensive sensors.One of the key strengths of MonoAux is that the additional multi-keypoint constraints and depth estimation assistance are only applied during the training phase.This means that during the inference phase,there is no additional computational cost.The effectiveness of MonoAux is validated through experiments conducted on the KITTI3D object detection validation set and test set.These results show a substantial improvement in performance,with MonoAux achieving a 3.87%and 4.64%increase in object detection accuracy compared to the baseline network MonoDLE.Moreover,when compared to other state-of-theart methods,MonoAux demonstrates significant performance advantages.It even outperforms some methods that rely on additional data,further proving its robustness and efficiency.In summary,MonoAux offers a significant advancement in monocular 3D object detection by addressing the core challenges of keypoint prediction and depth estimation.Its innovative use of multi-keypoint constraints and LiDAR-free depth estimation assistance not only improves accuracy but also ensures efficiency during the inference phase.The results on benchmark datasets underscore its potential to outperform existing methods,making it a promising solution for a range of applications.
作者
郑锦
王森
李航
周裕海
ZHENG Jin;WANG Sen;LI Hang;ZHOU Yu-Hai(School of Computer Science and Engineering,Beihang University,Beijing 100191;State Key Laboratory of Virtual Reality Technology and Systems,Beijing 100191)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2024年第12期2803-2818,共16页
Chinese Journal of Computers
关键词
3D目标检测
关键点预测
角点投影点
深度估计
激光点云
3D object detection
keypoint prediction
corner projection point
depth estimation
laser point cloud