摘要
场景的深度估计在三维视觉领域有着广泛的应用。针对单目室内场景深度估计精度低、细粒度信息预测能力差等问题,提出一种基于注意力机制和多级校正的单目深度估计网络。该网络首先采用混合自注意力Transformer和卷积神经网络的双分支模块提取彩色图像的多分辨率特征,然后利用基于空间域注意力机制的模块对提取的多分辨率特征进行渐进融合,最后通过多级校正的方式处理融合后的特征,并渐进地估计出不同分辨率的深度图像。实验结果表明,与同类方法相比,所提出的网络可有效提高深度图像细粒度信息的预测能力,网络的多个评价指标均有不同幅度的提升。
The depth estimation of scenes has a wide range of applications in the field of 3D vision.A monocular depth estimation network based on Attention Mechanism and multi-level correction is proposed to address the issues of low accuracy and poor prediction ability of fine-grained information in monocular indoor scene depth estimation.The network first uses a dual branch module with a self attention Transformer and a convolutional neural network to extract multi-resolution features of color images.Then,a module based on spatial domain Attention Mechanism is used to gradually fuse the extracted multi-resolution features.Finally,the fused features are processed through multi-level correction,and depth images with different resolutions are gradually estimated.The experimental results show that compared with similar methods,the proposed network can effectively improve the predictive ability of fine-grained information in depth images,and multiple evaluation indicators of the network have been improved to varying degrees.
作者
刘鹏
丁爱华
窦新宇
LIU Peng;DING Aihua;DOU Xinyu(Intelligence and Information Engineering College,Tangshan University,Tangshan 063000,China)
出处
《现代信息科技》
2024年第5期106-110,共5页
Modern Information Technology
基金
唐山市市级科技计划项目(22130205H)。