期刊文献+

结合金字塔结构和注意力机制的单目深度估计

Monocular depth estimation combining pyramid structure and attention mechanism
下载PDF
导出
摘要 单目深度估计是由单幅彩色图像预测出一幅稠密的深度图像。针对目前单目深度估计算法存在边界模糊、上下文信息捕捉能力不足等问题,提出了一种结合金字塔结构和注意力机制的单目深度估计算法。算法采用编码器-解码器的总体框架,其中编码器选用PVTv2网络,目的是利用Transformer网络在建模全局信息方面的优势以获取更充分的全局语义信息;解码器由深度估计主分支和2个金字塔子分支组成。深度估计主分支通过空间和通道注意力机制来自适应地关注编码器和解码器特征间重要的特征区域和特征通道;拉普拉斯金字塔子分支和深度残差金字塔子分支旨在从彩色图像和深度估计主分支深度特征中学习到丰富的局部信息并传递到深度估计主分支,进一步解决单目深度估计中细节缺失、结构混乱等问题。实验结果表明,与先进的算法P3Depth相比,在室内公开数据集NYU Depth V2上,该算法的δ_(1.25)阈值精度提升了1.22%,绝对误差和根均方误差分别降低了5.8%和2.8%;而在室外公开数据集KITTI上,该算法的绝对误差、根均方对数误差和根均方误差分别降低了8.5%,3.9%和0.4%。该算法提升了深度估计精度并得到了良好的视觉呈现效果。 Monocular depth estimation is the prediction of a dense depth image from a single color image.A monocular depth estimation algorithm combining pyramid structure and attention mechanism was proposed to address the issues of boundary ambiguity and insufficient capture of contextual information in current monocular depth estimation algorithms.The algorithm adopted the overall framework of encoder-decoder,in which the encoder selected the PVTv2 network to obtain more adequate global semantic information by taking advantage of the Transformer network in modeling global information.The decoder consisted of a depth estimation main branch and two pyramid sub-branches.The depth estimation main branch adaptively focused on important feature regions and feature channels between the encoder and decoder features through spatial and channel attention mechanisms.The Laplacian pyramid sub-branch and depth residual pyramid sub-branch aimed to learn rich local information from color images and depth estimation main branch depth features,transferring it to the depth estimation main branch to address the problems of missing details and chaotic structures in monocular depth estimation.Experimental results demonstrated that on the indoor public dataset NYU Depth V2,compared with the advanced algorithm P3Depth,the accuracy of δ_(1.25) threshold was increased by 1.22%,the absolute error and root mean square error were decreased by 5.8%and 2.8%,respectively.On the outdoor public dataset KITTI,the absolute error,root mean square logarithmic error,and root mean square error of the algorithm were decreased by 8.5%,3.9%,and 0.4%,respectively.The algorithm improved the accuracy of depth estimation and achieved a good visual rendering.
作者 李滔 胡婷 武丹丹 LI Tao;HU Ting;WU Dandan(School of Electrical Engineering and Electronic Information,Xihua University,Chengdu Sichuan 610039,China)
出处 《图学学报》 CSCD 北大核心 2024年第3期454-463,共10页 Journal of Graphics
基金 四川省科技计划项目(2021YJ0109) 国家自然科学基金项目(61901392,62041109)。
关键词 深度学习 单目深度估计 金字塔结构 注意力机制 TRANSFORMER deep learning monocular depth estimation pyramid structure attention mechanism Transformer
  • 相关文献

参考文献2

共引文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部