多模态融合的三维语义分割算法研究

Multi-modal-fusion-based 3D semantic segmentation algorithm

下载PDF

导出

摘要如何高效提取稠密感知的图像特征信息以及真实三维感知的点云特征信息并充分利用其各自优势进行信息互补是提升三维目标识别的关键。本文提出了一种图像和点云融合的多模态框架用于三维语义分割任务。图像与点云特征提取分支相互独立,设计深度估计融合网络用于图像分支,将稠密感知的图像语义信息与真值显式监督的深度特征信息有效融合,对点云的无序及稀疏性进行补偿。并改进体素特征提取方法,减少点云体素化带来的信息损失。图像、点云分支提取多尺度特征后通过动态特征融合模块提升网络对关键特征的提取能力,更有效的获取全局特征。同时本文提出点级的多模态融合数据增强策略,提升样本多样性的同时有效缓解样本不均衡问题。在Pandaset公开数据集上进行对比实验,本文的多模态融合框架展现出更优的性能和更强的鲁棒性,尤其在小样本小目标上性能提升更为明显。 Objective In the field of computer vision,cameras and LiDAR have their own advantages.Cameras have dense perception and RGB information,which can capture rich semantic information.LiDAR has more accurate ranging and can provide more accurate spatial information.How to utilize the advantages of cameras and LiDAR to achieve information complementarity is the key to improving 3D target recognition.The single-mode laser point cloud recognition network framework,whether based on point or voxel processing methods,cannot effectively solve the information loss caused by long time consumption or point cloud voxelization.Existing multi-modal networks that fuse images overly rely on point cloud input but fail to reduce the information loss caused by point cloud voxelization,weakening the high-dimensional semantic information provided by images and failing to fully utilize the complementary information between point clouds and images.To address the above issues,this paper improves the feature generation network and multi-modal fusion strategy,while proposing a point level multimodal data augmentation strategy to further enhance model performance.Methods The multi-modal network framework uses independent image and point cloud branches to extract multi-scale features and fuse them at the feature layer(Fig.1).The image branch uses a depth estimation fusion network to fuse dense perceptual image semantic information and truth supervised deep features(Fig.2),compensating for the disorder and sparsity of point clouds.In the point cloud branch,the feature extraction method for voxelization of point clouds has been improved(Fig.3),no longer solely using voxel center point features,but using vector features,standard deviation features,and extremum features for fusion.By using the dynamic feature fusion module(Fig.4)for feature fusion,the network's ability to extract key features is improved,and global features are obtained more effectively.A point level multimodal fusion data augmentation strategy is proposed,which not only enhances sample diversity but also alleviates the problem of sample imbalance to a certain extent,effectively improving the performance of the model.Results and Discussions Experiments are conducted using the open-source publicly available dataset Pandaset for autonomous driving at the L5 level,and IoU is used as an evaluation metric for semantic segmentation performance.We first visualized the point level multimodal fusion data augmentation strategy proposed in this paper on Pandaset,and found that this data augmentation strategy outperforms previous methods in terms of visual effects and sample authenticity in task expansion(Fig.5-6).At the same time,comparative experiments were conducted on this dataset with some mainstream 3D semantic segmentation algorithms based on point cloud single modal processing and image point cloud fusion multimodal processing.The algorithm proposed in this paper achieved performance improvement on most labels and mIoU(Tab.1),and the improvement was more significant on distant or small targets.This fully demonstrates the effectiveness of the algorithm proposed in this article,and verifies the effectiveness of each module proposed in this paper on model performance through ablation experiments(Tab.2).And additional comparative experiments were conducted on the improvement of model performance by data augmentation strategies,which proved that the click data augmentation strategy proposed in this paper is also superior to previous data augmentation methods in object detection tasks(Tab.3).Conclusions This paper improves the image and point cloud feature extraction network and designs a multimodal network framework for image and point cloud fusion,combining the advantages of dense perception images and real 3D perception point clouds to achieve information complementarity.A multimodal fusion network framework has been implemented to improve the performance of 3D object recognition,with the performance improvement being more significant on small samples and small targets.This paper demonstrates the effectiveness of the proposed algorithm through comparative experiments and ablation experiments on the open-source dataset Pandaset.

作者晁琪赵燕东刘圣波 Chao Qi;Zhao Yandong;Liu Shengbo(School of Engineering,Beijing Forestry University,Beijing 100080,China)

机构地区北京林业大学工学院

出处《红外与激光工程》 EI CSCD 北大核心 2024年第5期253-267,共15页 Infrared and Laser Engineering

基金国家自然科学基金青年科学基金项目(32101590)。

关键词图像点云融合深度估计融合体素特征语义分割数据增强 image point cloud fusion depth estimation fusion voxel features semantic segmentation data augmentation

分类号 TP391.41 [自动化与计算机技术—计算机应用技术] TN958.98 [电子电信—信号与信息处理]

引文网络
相关文献

1陈祖国,李俊杰,卢明,陈超洋,邹莹,陈娟.基于自适应色彩均衡及改进IBLA的水下图像增强[J].电光与控制,2024,31(2):118-124. 被引量：1
2刘起源,路锦正,黄炳森.基于特征融合和损失优化的点云语义分割网络[J].计算机技术与发展,2024,34(5):66-72.
3胡振涛,杨诗博,侯巍.基于局部变分贝叶斯推断的分布式交互式多模型估计[J].控制理论与应用,2024,41(4):681-690.
4丁璇,于霞,谭纪超,曾庆响,贾辉,宿鹏,胡万里.基于双目立体视觉的无人车越野路面三维重构研究[J].汽车零部件,2023(12):48-52.
5钟彩霞,孙海龙,辛发礼,徐洪兵,王翔,谢天,李小平.螺旋式卸船机自主卸煤智能化系统设计[J].今日制造与升级,2024(4):121-123.
6叶欣悦,朱磊,王文武,付云.互补特征交互融合的RGB_D实时显著目标检测[J].中国图象图形学报,2024,29(5):1252-1264. 被引量：1
7赵鑫,杨杰,李春全,吴廷野,道常凡,黄春霞.基于鲸鱼优化和并联深度学习模型的光伏功率超短期预测[J].电力科学与工程,2024,40(6):39-50. 被引量：2
8杜勇,黄良灿,沈小军,辛巍,金哲,翁永春.输电线路雷电绕击风险评估多源数据融合可视化方法[J].高电压技术,2024,50(5):1877-1888. 被引量：1

红外与激光工程

2024年第5期

浏览历史

内容加载中请稍等...

多模态融合的三维语义分割算法研究

相关作者

相关机构

相关主题

浏览历史