On Robust Cross-view Consistency in Self-supervised Monocular Depth Estimation

导出

摘要 Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.

作者 Haimei Zhao Jing Zhang Zhuo Chen Bo Yuan Dacheng Tao

机构地区 School of Computer Science Shenzhen International Graduate School School of Information Technology&Electrical Engineering

出处《Machine Intelligence Research》 EI CSCD 2024年第3期495-513,共19页 机器智能研究（英文版）

关键词 3D vision depth estimation cross-view consistency self-supervised learning monocular perception

分类号 TP391.41 [自动化与计算机技术—计算机应用技术] TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1Le Zou,Linsong Hu,Yifan Wang,Zhize Wu,Xiaofeng Wang.Perpendicular-Cutdepth:Perpendicular Direction Depth Cutting Data Augmentation Method[J].Computers, Materials & Continua,2024,79(4):927-941.
2ZHENG Sirui,HUANG Bo,LIU Jin,ZENG Guohui,YIN Ling,LI Zhi,SUN Tie.Fu-Rec:Multi-Task Learning Recommendation Model Fusing Neighbor-Discrimination and Self-Discrimination[J].Wuhan University Journal of Natural Sciences,2024,29(2):134-144.
3龚琼宇,胡入月,张旋,方兴,孟凡亚,于静,沈灵智,余文周.中国4省份育龄期女性对含风疹成分疫苗的接种犹豫现况及相关因素分析[J].中华预防医学杂志,2024,58(3):347-350.
4Jordan Adams,Andy Chong.Tilted spatiotemporal optical vortex with partial temporal coherence [Invited][J].Chinese Optics Letters,2023,21(12):7-10. 被引量：1
5李文礼,喻飞,石晓辉,唐远航,杨果.BEV特征下激光雷达和单目相机融合的目标检测算法研究[J].计算机工程与应用,2024,60(11):182-193.
6Biying Li,Zhiwei Liu,Wei Zhou,Haiyun Guo,Xin Wen,Min Huang,Jinqiao Wang.Structural Dependence Learning Based on Self-attention for Face Alignment[J].Machine Intelligence Research,2024,21(3):514-525.
7张学琪,胡海洋,潘开来,李忠金.基于多视图自适应3D骨架网络的工业装箱动作识别[J].中国图象图形学报,2024,29(5):1392-1407.
8Kun Li,Kai Zhao,Yongduan Song.Adaptive Consensus of Uncertain Multi-Agent Systems With Unified Prescribed Performance[J].IEEE/CAA Journal of Automatica Sinica,2024,11(5):1310-1312.
9孔焱,李晓慧,王辉,李岩,孙悦,苗强.间接法建立新疆克拉玛依地区常规肝功能和血脂生化项目参考区间[J].国际检验医学杂志,2024,45(7):858-861.
10Mingye Xu,Zhipeng Zhou,Yali Wang,Yu Qiao.Towards robustness and generalization of point cloud representation:A geometry coding method and a large-scale object-level dataset[J].Computational Visual Media,2024,10(1):27-43.

Machine Intelligence Research

2024年第3期

浏览历史

内容加载中请稍等...

On Robust Cross-view Consistency in Self-supervised Monocular Depth Estimation

相关作者

相关机构

相关主题

浏览历史