摘要
基于小波变换与运动恢复结构的自监督学习范式,将二维离散小波变换嵌入神经网络并实现梯度传播,提出了一种新的单目深度与位姿估计算法。传统的神经网络在降采样过程中会造成信息丢失,且丢失的信息在后续阶段无法复原,对于深度估计任务,结构信息的丢失会降低模型性能。本文使用二维离散小波变换层替代传统的降采样操作,更好地保留图像中的结构细节并避免噪声累积。在上采样解码深度图的阶段,采用小波逆变换层取代传统的插值上采样方法,更有效地恢复图像信息,得到更精确的深度图。提出的算法相比传统的神经网络对噪声更有鲁棒性。在KITTI数据集上进行实验,证明了所提出的算法在自监督单目深度与位姿估计任务中有优异的性能表现。
This paper proposes a novel depth and pose estimation framework,leveraging the wavelet transform and the self-supervised structure from the motion paradigm.The approach involves embedding 2D discrete wavelet transform into neural networks and im⁃plementing gradient propagation.Traditional convolutional neural networks(CNN)face a challenge during the down-sampling stage,as structural information is lost,and becomes irrecoverable in subsequent phases.This loss of information impacts the performance of depth estimation tasks,where complete structural information is crucial.This paper uses a 2D discrete wavelet transform layer to replace the down-sampling process of traditional neural networks,which can better preserve the structural details and avoid the accumulation of noise.In the up-sampling stage of the decoder,the inverse wavelet transform layer is used to replace the conventional interpolation method,which can effectively restore detailed information and promote the accuracy of the depth map.In addition,the proposed method has noise robustness compared to traditional neural networks.Experiments on the KITTI dataset demonstrate that the proposed algorithm performs excellently in the self-supervised depth and pose estimation tasks.
作者
乔善宝
高永彬
黄勃
余文俊
QIAO Shanbao;GAO Yongbin;HUANG Bo;YU Wenjun(School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201600,China)
出处
《武汉大学学报(理学版)》
CAS
CSCD
北大核心
2023年第6期777-786,共10页
Journal of Wuhan University:Natural Science Edition
基金
国家自然科学基金(61802253,U2033218)
科技创新2030—“新一代人工智能”重大项目(2020AAA0109302,2020AAA0109300)
上海晨光人才计划(17CG59)。
关键词
小波变换
自监督学习
单目深度估计
位姿估计
三维感知
wavelet transform
self-supervised learning
monocular depth estimation
pose estimation
3D perception