目的基于点云的3D目标检测是自动驾驶领域的重要技术之一。由于点云的非结构化特性,通常将点云进行体素化处理,然后基于体素特征完成3D目标检测任务。在基于体素的3D目标检测算法中,对点云进行体素化时会导致部分点云的数据信息和结构...目的基于点云的3D目标检测是自动驾驶领域的重要技术之一。由于点云的非结构化特性,通常将点云进行体素化处理,然后基于体素特征完成3D目标检测任务。在基于体素的3D目标检测算法中,对点云进行体素化时会导致部分点云的数据信息和结构信息的损失,降低检测效果。针对该问题,本文提出一种融合点云深度信息的方法,有效提高了3D目标检测的精度。方法首先将点云通过球面投影的方法转换为深度图像,然后将深度图像与3D目标检测算法提取的特征图进行融合,从而对损失信息进行补全。由于此时的融合特征以2D伪图像的形式表示,因此使用YOLOv7(you only look once v7)中的主干网络提取融合特征。最后设计回归与分类网络,将提取到的融合特征送入到网络中预测目标的位置、大小以及类别。结果本文方法在KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago)数据集和DAIR-V2X数据集上进行测试。以AP(average precision)值为评价指标,在KITTI数据集上,改进算法PP-Depth相较于PointPillars在汽车、行人和自行车类别上分别有0.84%、2.3%和1.77%的提升。以自行车简单难度为例,改进算法PP-YOLO-Depth相较于PointPillars、PP-YOLO和PP-Depth分别有5.15%、1.1%和2.75%的提升。在DAIR-V2X数据集上,PP-Depth相较于PointPillars在汽车、行人和自行车类别上分别有17.46%、20.72%和12.7%的提升。以汽车简单难度为例,PP-YOLO-Depth相较于PointPillars、PP-YOLO和PP-Depth分别有13.53%、5.59%和1.08%的提升。结论本文方法在KITTI数据集和DAIR-V2X数据集上都取得了较好表现,减少了点云在体素化过程中的信息损失并提高了网络对融合特征的提取能力和多尺度目标的检测性能,使目标检测结果更加准确。展开更多
This paper presents a voxel-based region growing method for automatic road surface extraction from mobile laser scanning point clouds in an expressway environment.The proposed method has three major steps:constructing...This paper presents a voxel-based region growing method for automatic road surface extraction from mobile laser scanning point clouds in an expressway environment.The proposed method has three major steps:constructing a voxel model;extracting the road surface points by employing the voxel-based segmentation algorithm;refining the road boundary using the curb-based segmentation algorithm.To evaluate the accuracy of the proposed method,the two-point cloud datasets of two typical test sites in an expressway environment consisting of flat and bumpy surfaces with a high slope were used.The proposed algorithm extracted the road surface successfully with high accuracy.There was an average recall of 99.5%,the precision was 96.3%,and the F1 score was 97.9%.From the extracted road surface,a framework for the estimation of road roughness was proposed.Good agreement was achieved when comparing the results of the road roughness map with the visual image,indicating the feasibility and effectiveness of the proposed framework.展开更多
The self-attention networks and Transformer have dominated machine translation and natural language processing fields,and shown great potential in image vision tasks such as image classification and object detection.I...The self-attention networks and Transformer have dominated machine translation and natural language processing fields,and shown great potential in image vision tasks such as image classification and object detection.Inspired by the great progress of Transformer,we propose a novel general and robust voxel feature encoder for 3D object detection based on the traditional Transformer.We first investigate the permutation invariance of sequence data of the self-attention and apply it to point cloud processing.Then we construct a voxel feature layer based on the self-attention to adaptively learn local and robust context of a voxel according to the spatial relationship and context information exchanging between all points within the voxel.Lastly,we construct a general voxel feature learning framework with the voxel feature layer as the core for 3D object detection.The voxel feature with Transformer(VFT)can be plugged into any other voxel-based 3D object detection framework easily,and serves as the backbone for voxel feature extractor.Experiments results on the KITTI dataset demonstrate that our method achieves the state-of-the-art performance on 3D object detection.展开更多
文摘目的基于点云的3D目标检测是自动驾驶领域的重要技术之一。由于点云的非结构化特性,通常将点云进行体素化处理,然后基于体素特征完成3D目标检测任务。在基于体素的3D目标检测算法中,对点云进行体素化时会导致部分点云的数据信息和结构信息的损失,降低检测效果。针对该问题,本文提出一种融合点云深度信息的方法,有效提高了3D目标检测的精度。方法首先将点云通过球面投影的方法转换为深度图像,然后将深度图像与3D目标检测算法提取的特征图进行融合,从而对损失信息进行补全。由于此时的融合特征以2D伪图像的形式表示,因此使用YOLOv7(you only look once v7)中的主干网络提取融合特征。最后设计回归与分类网络,将提取到的融合特征送入到网络中预测目标的位置、大小以及类别。结果本文方法在KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago)数据集和DAIR-V2X数据集上进行测试。以AP(average precision)值为评价指标,在KITTI数据集上,改进算法PP-Depth相较于PointPillars在汽车、行人和自行车类别上分别有0.84%、2.3%和1.77%的提升。以自行车简单难度为例,改进算法PP-YOLO-Depth相较于PointPillars、PP-YOLO和PP-Depth分别有5.15%、1.1%和2.75%的提升。在DAIR-V2X数据集上,PP-Depth相较于PointPillars在汽车、行人和自行车类别上分别有17.46%、20.72%和12.7%的提升。以汽车简单难度为例,PP-YOLO-Depth相较于PointPillars、PP-YOLO和PP-Depth分别有13.53%、5.59%和1.08%的提升。结论本文方法在KITTI数据集和DAIR-V2X数据集上都取得了较好表现,减少了点云在体素化过程中的信息损失并提高了网络对融合特征的提取能力和多尺度目标的检测性能,使目标检测结果更加准确。
基金Project(SIIT-AUN/SEED-Net-G-S1 Y16/018)supported by the Doctoral Asean University Network ProgramProject supported by the Metropolitan Expressway Co.,Ltd.,Japan+2 种基金Project supported by Elysium Co.Ltd.Project supported by Aero Asahi Corporation,Co.,Ltd.Project supported by the Expressway Authority of Thailand。
文摘This paper presents a voxel-based region growing method for automatic road surface extraction from mobile laser scanning point clouds in an expressway environment.The proposed method has three major steps:constructing a voxel model;extracting the road surface points by employing the voxel-based segmentation algorithm;refining the road boundary using the curb-based segmentation algorithm.To evaluate the accuracy of the proposed method,the two-point cloud datasets of two typical test sites in an expressway environment consisting of flat and bumpy surfaces with a high slope were used.The proposed algorithm extracted the road surface successfully with high accuracy.There was an average recall of 99.5%,the precision was 96.3%,and the F1 score was 97.9%.From the extracted road surface,a framework for the estimation of road roughness was proposed.Good agreement was achieved when comparing the results of the road roughness map with the visual image,indicating the feasibility and effectiveness of the proposed framework.
基金National Natural Science Foundation of China(No.61806006)Innovation Program for Graduate of Jiangsu Province(No.KYLX160-781)University Superior Discipline Construction Project of Jiangsu Province。
文摘The self-attention networks and Transformer have dominated machine translation and natural language processing fields,and shown great potential in image vision tasks such as image classification and object detection.Inspired by the great progress of Transformer,we propose a novel general and robust voxel feature encoder for 3D object detection based on the traditional Transformer.We first investigate the permutation invariance of sequence data of the self-attention and apply it to point cloud processing.Then we construct a voxel feature layer based on the self-attention to adaptively learn local and robust context of a voxel according to the spatial relationship and context information exchanging between all points within the voxel.Lastly,we construct a general voxel feature learning framework with the voxel feature layer as the core for 3D object detection.The voxel feature with Transformer(VFT)can be plugged into any other voxel-based 3D object detection framework easily,and serves as the backbone for voxel feature extractor.Experiments results on the KITTI dataset demonstrate that our method achieves the state-of-the-art performance on 3D object detection.