摘要
三维目标检测中图像数据难以获得目标距离信息,点云数据难以获得目标类别信息,为此提出一种将图像转为俯视角特征的方法,将多尺度图像特征按水平维度展平,通过稠密变换层转变为多尺度图像俯视角特征,最终重塑为全局图像俯视角特征.在此基础上,提出一种基于俯视角融合的多模态三维目标检测网络,利用特征拼接或元素相加的方法融合图像俯视角特征与点云俯视角特征.在KITTI数据集上的实验表明,提出的基于俯视角融合的多模态三维目标检测网络对于车辆、行人目标的检测效果优于其他流行的三维目标检测方法.
In order to solve the problem that it is difficult to obtain target distance information from image data and target category information from point cloud data in 3D object detection,a method is proposed to convert the image into Bird-Eye-View features.This method flattens the multi-scale image features according to horizontal dimensions and transforms them into multi-scale image Bird-Eye-View features through dense transformation layers,and finally reshapes them into global image top angle features.On this basis,a multi-modal 3D object detection network based on Bird-Eye-View fusion is proposed to fuse the Bird-Eye-View features of image and point cloud with feature concating or element addition.Experiments on KITTI data set show that the multi-modal 3D object detection network based on Bird-Eye-View fusion proposed in this paper is better than other popular 3D object detection methods for vehicles and pedestrians.
作者
钱多
殷俊
Qian Duo;Yin Jun(College of Information Engineering,Shanghai Maritime University,Shanghai,201306,China)
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2023年第6期996-1002,共7页
Journal of Nanjing University(Natural Science)
基金
上海市浦江人才计划(22PJD029)
关键词
三维目标检测
多模态融合
点云
俯视角
深度学习
3D object detection
multi-modal fusion
point cloud
Bird-Eye-View
deep learning