Adaptive multi-modal feature fusion for far and hard object detection

自适应性多模态特征融合的远小困难目标检测

下载PDF

导出

摘要 In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is proposed,which makes use of multi-neighborhood information of voxel and image information.Firstly,design an improved ResNet that maintains the structure information of far and hard objects in low-resolution feature maps,which is more suitable for detection task.Meanwhile,semantema of each image feature map is enhanced by semantic information from all subsequent feature maps.Secondly,extract multi-neighborhood context information with different receptive field sizes to make up for the defect of sparseness of point cloud which improves the ability of voxel features to represent the spatial structure and semantic information of objects.Finally,propose a multi-modal feature adaptive fusion strategy which uses learnable weights to express the contribution of different modal features to the detection task,and voxel attention further enhances the fused feature expression of effective target objects.The experimental results on the KITTI benchmark show that this method outperforms VoxelNet with remarkable margins,i.e.increasing the AP by 8.78%and 5.49%on medium and hard difficulty levels.Meanwhile,our method achieves greater detection performance compared with many mainstream multi-modal methods,i.e.outperforming the AP by 1%compared with that of MVX-Net on medium and hard difficulty levels.

作者 LI Yang GE Hongwei 李阳;葛洪伟(江南大学江苏省模式识别与计算智能实验室,江苏无锡214122;江南大学人工智能与计算机学院,江苏无锡214122)

机构地区 Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence School of Artificial Intelligence and Computer Science

出处《Journal of Measurement Science and Instrumentation》 CAS CSCD 2021年第2期232-241,共10页 测试科学与仪器（英文版）

基金 National Youth Natural Science Foundation of China(No.61806006) Innovation Program for Graduate of Jiangsu Province(No.KYLX160-781) Jiangsu University Superior Discipline Construction Project。

关键词 3D object detection adaptive fusion multi-modal data fusion attention mechanism multi-neighborhood features 3D目标检测自适应性融合多模态数据融合注意力机制多邻域特征

分类号 TN957.52 [电子电信—信号与信息处理]

引文网络
相关文献

1吴敏,王敏.基于深度学习的多模态时空动作识别[J].计算机系统应用,2021,30(3):272-275. 被引量：2
2万璋,张玉洁,刘明童,徐金安,陈钰枫.融合物体空间关系机制的图像摘要生成方法[J].北京大学学报（自然科学版）,2021,57(1):75-82.
3刘波.技巧啦啦操尖子专项力量与柔韧素质训练方法探析[J].运动-休闲（大众体育）,2021(1):77-77.
4杨杨,詹德川,姜远,熊辉.可靠多模态学习综述[J].软件学报,2021,32(4):1067-1081. 被引量：12
5曾静静,张晓刚,王刚.基于LiDAR点云的城区地表变化检测[J].城市勘测,2021(2):92-95. 被引量：2
6王敏红.小学语文教育过程中提问的艺术研究[J].东西南北（教育）,2021(10):161-161.
7刘雨晴,葛爱明,赵宝彬,郝如龙.基于单色与全光谱照明的高光谱成像系统及图像融合[J].照明工程学报,2021,32(2):99-106.
8王清波,虞成,陈天笑,杨攀,高云,袁杰.基于OpenVINO的情绪识别反馈康复训练系统设计[J].中国医学装备,2021,18(1):102-105. 被引量：2

Journal of Measurement Science and Instrumentation

2021年第2期

浏览历史

内容加载中请稍等...

Adaptive multi-modal feature fusion for far and hard object detection

相关作者

相关机构

相关主题

浏览历史