基于全卷积编码-解码对称网络的单目图像深度估计

Monocular Image Depth Estimation Based on Fully Convolutional Network with a Symmetric Encoder-Decoder Architecture

下载PDF

导出

摘要单目图像深度估计通过唯一视角下的图像来感知每个像素的空间位置关系,对于场景理解、三维重建等具有重要意义。为了全面提升预测深度图涵盖的信息量,保持关键细节不丢失,论文基于对称的编解码结构设计了一个全卷积网络来执行深度估计任务,称为ResUNet。该网络继承了U-Net模型的经典架构,首先采用了改进的ResNet网络来实现特征编码,其次保留了U-Net模型的解码器设计来将特征图解码为深度图,这种结构设计融合了ResNet和U-Net网络的特点,通过协同优化最大程度发挥了各自的优势,能够在进行深度估计的过程中实现空间结构和细节信息的最大程度保留,进而提升预测深度图的真实性与可靠性。基于该网络进一步提出了ResDepth算法,该算法针对深度图预测过程中容易产生物体失真、细节淹没的问题,从损失函数的角度出发设计了一个联合损失函数,在不带来额外计算开销的情况下全面提升了预测深度图的质量。最后,在NYU-Depth V2、SUN RGB-D及KITTI三个公开数据集上进行对比实验来评估算法性能,实验表明,论文提出的ResDepth算法及联合损失函数能够更好地保留空间结构信息及几何细节信息,进而提升深度估计结果的准确性。 Monocular image depth estimation perceives the spatial position relationship of each pixel through the image from a unique viewing angle,which is of great significance for scene understanding and 3D reconstruction.In order to comprehensively im-prove the amount of information covered by the prediction depth map and keep the key details from being lost,this paper designs a fully convolutional network based on the symmetric codec structure to perform the depth estimation task,which is called ResUNet.The network inherits the classical architecture of the U-Net model,firstly the improved ResNet network is used to realize the feature encoding,and secondly,the decoder design of the U-Net model is retained to decode the feature map into a depth map,which inte-grates the characteristics of ResNet and U-Net network,and maximizes their respective advantages through collaborative optimiza-tion,which can realize the maximum retention of spatial structure and detail information in the process of depth estimation,and then improve the authenticity and reliability of the prediction depth map.Based on the network,the ResDepth algorithm is further proposed,which is designed from the perspective of loss function to comprehensively improve the quality of the predicted depth map without bringing additional computational overhead.Finally,comparative experiments are carried out on three public datasets,which are NYU-Depth V2,SUN RGB-D and KITTI to evaluate the performance of the algorithm,and the experiments show that the ResDepth algorithm and the joint loss function proposed in this paper can better retain the spatial structure information and geo-metric detail information,and then improve the accuracy of the depth estimation results.

作者江忠泽陈忠徐雪茹吴亮 JIANG Zhongze;CHEN Zhong;XU Xueru;WU Liang(School of Artificial Intelligence and Automation,Huazhong University of Science and Technology,Wuhan 430074;School of Foreign Languages,Huazhong University of Science and Technology,Wuhan 430074)

机构地区华中科技大学人工智能与自动化学院华中科技大学外国语学院

出处《计算机与数字工程》 2024年第5期1488-1494,共7页 Computer & Digital Engineering

基金民用航天十三五预先研究项目(编号:D040401-w05) 国产卫星应急观测与信息支持关键技术(编号:B0302)资助。

关键词单目深度估计全卷积网络空洞卷积联合损失函数 monocular depth estimation fully convolutional network dilated convolution joint loss function

分类号 TP751 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献1

1张蓓蕾,孙韶媛,武江伟,谷小婧.基于DRF-MAP模型的单目图像深度估计的改进算法[J].红外技术,2009,31(12):712-715. 被引量：3

二级参考文献8

1仲思东,熊军,刘勇.基于全周多视角的三维重建技术[J].机器人,2004,26(6):558-562. 被引量：7
2Sanjiv KUMAR, Discriminative Martial HEBERT. Framework for Discriminative Random Fields :A Contextual Interaction in Classification[C].IEEE International Conference on Computer Vision 2003(2): 1150-1157.
3Ashutosh Saxena, Sung H. Chung , Andrew Y. Ng. 3-D Depth Reconstruction from a Single Still Image[J]. Computer Vision, 2008, 76: 53-69.
4顾征,苏显渝.三目自适应权值立体匹配和视差校准算法[J].光学学报,2008,28(4):734-738. 被引量：14
5潘聪,王向军,文鹏程.基于DRF模型的自然背景中人造目标的检测算法[J].红外技术,2008,30(7):391-394. 被引量：1
6谷小婧,孙韶媛,方建安.Coloring night vision imagery for depth perception[J].Chinese Optics Letters,2009,7(5):396-399. 被引量：2
7陆明俊,王润生.计算机视觉中的Markov随机场方法[J].电子科学学刊,2000,22(6):1028-1037. 被引量：11
8赵梅芳,沈邦兴,吴晓明,蒋登峰.多目立体视觉在工业测量中的应用研究[J].计算机测量与控制,2003,11(11):833-835. 被引量：15

共引文献2

1郭子乾,孙韶媛,许真,代中华.一种基于单目深度线索的彩色夜视技术研究[J].微计算机信息,2011,27(9):183-185.
2王倩倩,赵海涛.基于深度CRF网络的单目红外场景深度估计[J].红外技术,2020,42(6):580-588. 被引量：2

1郑剑锋,张广涛,刘英莉.基于自适应注意力机制的表格结构识别模型[J].化工自动化及仪表,2024,51(3):449-455.

计算机与数字工程

2024年第5期

浏览历史

内容加载中请稍等...

基于全卷积编码-解码对称网络的单目图像深度估计

参考文献1

二级参考文献8

共引文献2

相关作者

相关机构

相关主题

浏览历史