期刊文献+

改进PointPillars和Transformer的路侧两阶段三维目标检测算法

Improved Two-Stage 3D Object Detection Algorithm for Roadside Scenes with Enhanced PointPillars and Transformer
原文传递
导出
摘要 为解决路侧点云目标检测任务中复杂场景下远距离车辆漏检率高和道路行人误检率高等问题,提出一种改进PointPillars和Transformer的路侧两阶段三维目标检测算法。算法的第一阶段基于PointPillars设计:骨干网络嵌入SimAM注意力机制学习相似性信息以关注重要特征,替换下采样部分的普通卷积块为带有残差结构的卷积块以提高网络性能。第二阶段基于Transformer对第一阶段生成的候选框进行细化:编码器构建原始点特征进行编码,解码器利用通道加权增强通道信息,提高检测精度,改善误检问题。为验证所提算法的性能,在路侧数据集DAIR-V2X-I和车端数据集KITTI上进行实验。实验结果表明,所提算法相比其他公开算法检测准确率明显提升,同基准算法PointPillars相比,在moderate检测难度下,对DAIR-V2X-I数据集中汽车、行人、骑行者的检测准确率分别提高1.9百分点、10.5百分点、2.11百分点,KITTI数据集中汽车、行人、骑行者的检测准确率分别提高2.34百分点、4.73百分点、8.17百分点。 This study proposes a two-stage three-dimensional object detection algorithm tailored for roadside scenes,aiming to address the challenges of high missed detection rates for long-distance vehicles and high false detection rates for pedestrians in complex scenes involved in cloud object detection tasks.This algorithm improves PointPillars and Transformer.In the first stage of the algorithm,the PointPillars-based backbone network incorporates the SimAM attention mechanism to capture similarity information,prioritizing essential features.This stage replaces standard convolutional blocks in the downsampling section with residual structures to improve network performance.The second stage of the algorithm utilizes Transformer to refine the candidate boxes generated in the first stage:the encoder constructs the original point features for encoding,while the decoder employs channel weighting to enhance channel information,thereby enhancing detection accuracy and mitigating false detection.The effectiveness of the proposed algorithm was tested on the DAIR-V2X-I roadside dataset and the KITTI vehicle-end dataset.Experimental results demonstrated substantial improvements in detection accuracy over other publicly available algorithms.Compared with the benchmark algorithm PointPillars,for moderate detection difficulty,accuracy improvements in detecting cars,pedestrians,and cyclists on the DAIR-V2X-I dataset were 1.9 percentage points,10.5 percentage points,and 2.11 percentage points,respectively.Moreover,corresponding improvements on the KITTI dataset were 2.34 percentage points,4.73 percentage points,and 8.17 percentage points,respectively.
作者 王量子 黄妙华 刘若璎 毕程程 胡永康 Wang Liangzi;Huang Miaohua;Liu Ruoying;Bi Chengcheng;Hu Yongkang(Hubei Key Laboratory of Advanced Technology for Automotive Components,Wuhan University of Technology,Wuhan 430070,Hubei,China;Hubei Collaborative Innovation Center for Automotive Components Technology,Wuhan University of Technology,Wuhan 430070,Hubei,China;Hubei Research Center for New Energy&Intelligent Connected Vehicle,Wuhan University of Technology,Wuhan 430070,Hubei,China)
出处 《激光与光电子学进展》 CSCD 北大核心 2024年第18期403-412,共10页 Laser & Optoelectronics Progress
基金 国家重点研发计划(2018YFE0105500)。
关键词 三维目标检测 误检漏检 TRANSFORMER 注意力机制 残差结构 3D object detection false positives and missed detections Transformer attention mechanism residual structure
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部