摘要
提出一种高性能的基于深度语义和位置信息融合的双阶段三维目标检测(DSPF-RCNN)算法。在第一阶段提出深度特征提取-区域选取网络(DFE-RPN),使网络在俯视图中能够提取目标更深层次的纹理特征和语义特征。在第二阶段提出逐点语义和位置特征融合(ASPF)模块,使网络能够自适应地提取目标最有差异性的特征,增强中心点在特征提取时的聚合能力。算法在KITTI数据集上进行测试,结果显示,测试集中Car类目标在Easy、Moderate和Hard水平的检测精度均优于现有的主流算法,检测精度分别为89.90%,81.04%和76.45%;验证集中Car和Cyclist类目标在Moderate水平的检测精度分别为84.40%和73.90%,相对于主流算法提升了4%左右,推理时间为64 ms。最后将算法部署在实车平台上实现了在线检测,验证了其工程价值。
Object Precise perception of the surrounding environment is the basis for realizing various functions in autonomous driving.The accurate identification of the location of 3D targets in real scenes is key to improving the overall performance of autonomous driving.Lidar has become pivotal in this field because of its superiority in sensing richer 3D spatial information while being less affected by weather and other environmental factors.Current 3D target detection methods are mainly based on deep learning,which can achieve a higher detection accuracy than traditional clustering and segmentation algorithms.The key to target detection based on deep learning is the in-depth extraction and utilization of point-cloud feature information.If feature information cannot be fully utilized,the target is misdetected or missed(Fig.1),which has a significant impact on the safety of the automatic driving function.Therefore,deep extraction and utilization of point cloud information are key to improving the accuracy of 3D target detection.Methods This study proposes a two-stage 3D target detection network(DSPF-RCNN,Fig.1).In the first stage,the unordered original point cloud is divided into the regular voxel space,and the point-wise feature is converted into voxel-wise feature by using convolution neural network.The down-sampling output of the last layer is transformed into a 2D bird s eye view(BEV),whereby the BEV is input into the deep feature extraction-region proposal network(DFE-RPN,Fig.2)for depth extraction of 2D features.Through the fusion of deep and shallow texture features with deep semantic features,the ability of the network to capture 2D image features is enhanced.In the second stage,some point clouds are selected as center points in the latter two 3D down-sampling voxel spaces through the farthest point sampling,and the center points are input into the aware-point semantics and position feature fusion(ASPF)module(Fig.3),allowing the integration of the 3D semantic features and location information of the surrounding point clouds.In this manner,the network can adaptively extract more diverse features of the target because these center points have a stronger feature aggregation ability when aggregating neighboring point clouds,which improves the network s ability to aggregate different feature information of the target.These center points are then used to aggregate the features of the surrounding point clouds in the 3D voxel space(Fig.4).Subsequently,the region-of-interest pooling is conducted for the aggregated features and target candidate boxes generated in the first stage.Finally,the more refined classification and boundary box regression are conducted for the target through the fully connected layer.Discussions The DSPF-RCNN is tested and evaluated using the official KITTI test and validation sets.The detection results for Car are better than those of the existing mainstream algorithms in the test set(Table 1),and the detection accuracies at the three difficulty levels are 89.90%,81.04%,and 76.45%.In the KITTI validation set(Table 2),at the 11 recall positions,the detection accuracy is improved by 4%compared with those of the SVGA-Net and Part-A2 networks at moderate levels for Car and Cyclist.The DSPF-RCNN can accurately detect the three types of targets(Fig.5).The effectiveness of the proposed innovation module is further compared and analyzed(Table 5).The results show that,after integrating the 3D semantic features and position features of the surrounding point cloud,the central point can better aggregate the feature information of the surrounding point cloud in the feature aggregation stage.However,when the DFE-RPN module is added,the network s ability to capture features increase further,and the ability to extract small-target feature information,such as cyclists and pedestrians,is significantly improved.Finally,a comparative analysis is performed on the network time utilization,including the time consumed by each module in reasoning through a frame of point cloud data(Table 6).The comparison between DSPF-RCNN and the other two-stage algorithms(Table 7)shows that the total inference time of DSPF-RCNN is 64 ms,which is more advantageous in terms of the inference speed of the two-stage algorithm.Finally,the algorithm is deployed on a real vehicle platform to realize online detection(Fig.7).Conclusions In this study,a two-stage target detection algorithm,the DSPF-RCNN,based on a laser point cloud is proposed.First,the proposed DFE-RPN module extracts abundant target feature information from 2D images.In the second stage,the proposed ASPF module allows the central points to aggregate the salient features of different targets.Through testing on the KITTI test set and validation set,and comparison with mainstream methods,it is concluded that DSPF-RCNN performance is more advantageous in accurately detecting targets with different sizes,including small targets.At moderate levels in the KITTI validation set,the detection accuracies for Car and Cyclist are improved by approximately 4%,and the total network inference time is 64 ms.Finally,the DSPF-RCNN is applied to a local dataset to verify its engineering value.
作者
胡杰
安永鹏
徐文才
熊宗权
刘汉
Hu Jie;An Yongpeng;Xu Wencai;Xiong Zongquan;Liu Han(Hubei Key Laboratory of Advanced Technology for Automotive Components,Wuhan University of Technology,Wuhan 430070,Hubei,China;Hubei Collaborative Innovation Center for Automotive Components Technology,Wuhan University of Technology,Wuhan 430070,Hubei,China;Hubei Research Center for New Energy&Intelligent Connected Vehicle,Wuhan University of Technology,Wuhan 430070,Hubei,China)
出处
《中国激光》
EI
CAS
CSCD
北大核心
2023年第10期192-202,共11页
Chinese Journal of Lasers
基金
湖北省科技重大专项(2020AAA001,2022AAA001)。
关键词
遥感
自动驾驶
激光雷达
三维目标检测
特征融合
remote sensing
automatic drive
LIDAR
3D target detection
feature fusion