结合金字塔结构和注意力机制的单目深度估计

Monocular depth estimation combining pyramid structure and attention mechanism

下载PDF

导出

摘要单目深度估计是由单幅彩色图像预测出一幅稠密的深度图像。针对目前单目深度估计算法存在边界模糊、上下文信息捕捉能力不足等问题,提出了一种结合金字塔结构和注意力机制的单目深度估计算法。算法采用编码器-解码器的总体框架,其中编码器选用PVTv2网络,目的是利用Transformer网络在建模全局信息方面的优势以获取更充分的全局语义信息;解码器由深度估计主分支和2个金字塔子分支组成。深度估计主分支通过空间和通道注意力机制来自适应地关注编码器和解码器特征间重要的特征区域和特征通道;拉普拉斯金字塔子分支和深度残差金字塔子分支旨在从彩色图像和深度估计主分支深度特征中学习到丰富的局部信息并传递到深度估计主分支,进一步解决单目深度估计中细节缺失、结构混乱等问题。实验结果表明,与先进的算法P3Depth相比,在室内公开数据集NYU Depth V2上,该算法的δ_(1.25)阈值精度提升了1.22%,绝对误差和根均方误差分别降低了5.8%和2.8%;而在室外公开数据集KITTI上,该算法的绝对误差、根均方对数误差和根均方误差分别降低了8.5%,3.9%和0.4%。该算法提升了深度估计精度并得到了良好的视觉呈现效果。 Monocular depth estimation is the prediction of a dense depth image from a single color image.A monocular depth estimation algorithm combining pyramid structure and attention mechanism was proposed to address the issues of boundary ambiguity and insufficient capture of contextual information in current monocular depth estimation algorithms.The algorithm adopted the overall framework of encoder-decoder,in which the encoder selected the PVTv2 network to obtain more adequate global semantic information by taking advantage of the Transformer network in modeling global information.The decoder consisted of a depth estimation main branch and two pyramid sub-branches.The depth estimation main branch adaptively focused on important feature regions and feature channels between the encoder and decoder features through spatial and channel attention mechanisms.The Laplacian pyramid sub-branch and depth residual pyramid sub-branch aimed to learn rich local information from color images and depth estimation main branch depth features,transferring it to the depth estimation main branch to address the problems of missing details and chaotic structures in monocular depth estimation.Experimental results demonstrated that on the indoor public dataset NYU Depth V2,compared with the advanced algorithm P3Depth,the accuracy of δ_(1.25) threshold was increased by 1.22%,the absolute error and root mean square error were decreased by 5.8%and 2.8%,respectively.On the outdoor public dataset KITTI,the absolute error,root mean square logarithmic error,and root mean square error of the algorithm were decreased by 8.5%,3.9%,and 0.4%,respectively.The algorithm improved the accuracy of depth estimation and achieved a good visual rendering.

作者李滔胡婷武丹丹 LI Tao;HU Ting;WU Dandan(School of Electrical Engineering and Electronic Information,Xihua University,Chengdu Sichuan 610039,China)

机构地区西华大学电气与电子信息学院

出处《图学学报》 CSCD 北大核心 2024年第3期454-463,共10页 Journal of Graphics

基金四川省科技计划项目(2021YJ0109) 国家自然科学基金项目(61901392,62041109)。

关键词深度学习单目深度估计金字塔结构注意力机制 TRANSFORMER deep learning monocular depth estimation pyramid structure attention mechanism Transformer

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1谢昭,马海龙,吴克伟,高扬,孙永宣.基于采样汇集网络的场景深度估计[J].自动化学报,2020,46(3):600-612. 被引量：3
2Wenhai Wang,Enze Xie,Xiang Li,Deng-Ping Fan,Kaitao Song,Ding Liang,Tong Lu,Ping Luo,Ling Shao.PVT v2:Improved baselines with Pyramid Vision Transformer[J].Computational Visual Media,2022,8(3):415-424. 被引量：66

共引文献67

1李敏,乔志远,杨易鑫.基于光学遥感影像的舰船检测研究综述[J].网络安全与数据治理,2023,42(S01):106-114.
2罗会兰,周逸风.深度学习单目深度估计研究进展[J].中国图象图形学报,2022,27(2):390-403. 被引量：5
3孔慧芳,房亮.基于多尺度特征融合的快速单目图像深度估计[J].合肥工业大学学报（自然科学版）,2022,45(3):332-335. 被引量：3
4张显杰,张之明.基于卷积神经网络和Transformer的手写体英文文本识别[J].计算机应用,2022,42(8):2394-2400. 被引量：3
5薛相全,庞明宝.基于Transformer-ESIM的高速公路交通状态识别模型[J].物流科技,2022,45(17):71-75.
6单维锋,李志扬,陈俊,刘海军,张秀霞,邢丽莉,胡秀娟,夏庆新,夏金铸.应用卷积神经网络和自注意力机制识别地磁场干扰事件[J].地震地磁观测与研究,2022,43(5):49-63.
7Ge-Peng Ji,Guobao Xiao,Yu-Cheng Chou,Deng-Ping Fan,Kai Zhao,Geng Chen,Luc Van Gool.Video Polyp Segmentation: A Deep Learning Perspective[J].Machine Intelligence Research,2022,19(6):531-549. 被引量：11
8刘洋,李相国,连良秀.基于AIOT的安全生产监管平台关键技术研究[J].网络安全技术与应用,2022(12):7-9. 被引量：2
9李翔,张涛,张哲,魏宏杨,钱育蓉.Transformer在计算机视觉领域的研究综述[J].计算机工程与应用,2023,59(1):1-14. 被引量：15
10冯珺,彭梁英,赵帅,潘司晨,郭雪强.基于孪生神经网络的小样本目标检测综述[J].河北科技大学学报,2022,43(6):643-650. 被引量：2

1李秀菊,宋艾林,范丽华.敞开酸溶和混合碱熔ICP-AES法测定土壤中锰[J].干旱环境监测,2023,37(4):153-157.
2陈大川.普通车床增设卡盘扳手互锁安全装置的必要性和使用效果[J].中国科技期刊数据库工业A,2016(10):83-83.
3熊雯琼.关于小学生习作有效指导的方法探析[J].中文科技期刊数据库（引文版）教育科学,2019(2):111-112.
4包从望,朱广勇,邹旺,郭灏.基于SimAM注意力机制的轴承故障迁移诊断模型[J].机电工程,2024,41(5):862-869. 被引量：1
5梁燕,饶星晨.改进YOLOX的遥感图像目标检测算法[J].计算机工程与应用,2024,60(12):181-188. 被引量：1
6陈春霏,卢秋,姚苏芝,梁晓曦,洪欣,李丽和,韦江慧.粉末压片-X射线荧光光谱法测定富硅土壤和沉积物样品中的5种重金属元素[J].中国无机分析化学,2024,14(5):513-520.
7杨德磊,董珂欣,杨曦.基于贝叶斯网络的乡村振兴项目进度风险因素研究[J].项目管理技术,2024,22(5):57-66.
8张弛,魏峰涛.心力衰竭与癌症发生发展研究进展[J].社区医学杂志,2024,22(5):175-180.
9张铭,王伟,钟权加,丁瑞强,李建平.耦合Lorenz模型的吸引子特性及其可预报性分析[J].大气科学,2023,47(6):1746-1756.

图学学报

2024年第3期

浏览历史

内容加载中请稍等...

结合金字塔结构和注意力机制的单目深度估计

参考文献2

共引文献67

相关作者

相关机构

相关主题

浏览历史