摘要
在当前的单目深度算法中,堆叠的卷积层和过度的下采样操作会造成特征图分辨率和高层信息的损失,影响了深度图整体的精度。针对这一问题,本文提出了一个基于多尺度特征融合的单目深度估计算法。采用了递进式的编-解码结构,由浅到深逐级提取不同尺度的信息,不同层级不同分辨率的特征连接在一起,形成了多尺度特征融合结构;编码器采用U^(2)-Net的设计架构,内部通过Vision Transformer模块,使得模型能够在编码过程中拥有全局的感受野,并且避免了下采样操作,从而减少了特征图分辨率和高层信息的损失;解码器中设计了U型残差块,能更好地融合不同阶段内的多尺度特征。在KITTI和NYU-Depth V2数据集上进行了实验,实验结果表明本文所提算法在各项指标上优于大部分同类型算法。
In the current Monocular depth estimation algorithms,stacked convolutional layers and excessive downsampling operations lead to the loss of feature map resolution and high-level information,affecting the overall accuracy of the depth map.To address this issue,this paper proposes a monocular depth estimation model based on multi-scale feature fusion.The model adopts a progressive encoder-decoder structure to extract information of different scales from shallow to deep levels.Moreover,the features of different resolutions at different levels are connected to form a multi-scale feature fusion structure.The encoder is inspired by the design of Transformer,which has a global receptive field during encoding,while avoiding downsampling operations to reduce the loss of feature map resolution and high-level information.The decoder incorporates U-shaped residual blocks to better fuse multi-scale features within different stages.Our method was tested on the KITTI and NYU Depth V2 datasets,and the experimental results showed that it exhibited competitive performance on both datasets.
作者
周晓吉
ZHOU Xiaoji(School of Information Science and Engineering,Zhejiang Sci-Tech University,Hangzhou 310018,China)
出处
《智能计算机与应用》
2024年第9期34-40,共7页
Intelligent Computer and Applications