期刊文献+

面向无人机自主飞行的无监督单目视觉深度估计 被引量:6

Unsupervised Monocular Depth Estimation for Autonomous Flight of Drones
原文传递
导出
摘要 针对双目视觉深度估计成本高、体积大以及监督学习需要大量深度图进行训练的问题,为实现无人机在飞行过程中的场景理解,提出一种面向无人机自主飞行的无监督单目深度估计模型。首先,为减小不同尺寸目标对深度估计的影响,将输入的图像进行金字塔化处理;其次,针对图像重构设计一种基于ResNet-50进行特征提取的自编码神经网络,该网络基于输入的左视图或右视图以及生成对应的金字塔视差图,采用双线性插值的方法重构出与其对应的金字塔右视图或左视图;最后为提高深度估计的精度,将结构相似性引入到图像重构损失、视差图一致性损失中,并且联合视差图平滑性损失、图像重构损失、视差图一致性损失作为训练的总损失。实验结果表明,经过在KITTI数据集上的训练,该模型在KITTI和Make3D数据集上相比其他单目深度估计方法具有更高的准确性和实时性,基本满足无人机自主飞行对深度估计准确性和实时性的要求。 This study proposes an unsupervised monocular depth estimation model for autonomous drone flight to overcome the limitations of high cost and large size in binocular depth estimation and a large number of depth maps required for training in supervised learning.The model first processes the input image into a pyramid shape to reduce the impact of different target sizes on the depth estimation.In addition,the neural network of the automatic encoder used for image reconstruction is designed based on ResNet-50,which is capable of feature extraction.The corresponding right or left pyramid images are subsequently reconstructed by the bilinear sampling method based on the left or right input images,and corresponding pyramid disparity map is generated.Finally,the training loss could be assessed as the combination of the disparity smoothness loss,image reconstruction loss based on the structural similarity,and the loss of disparity consistency.Experimental results indicate that the model is more accurate and timely on KITTI and Make3D compared with other monocular depth estimation methods.When trained on KITTI,the model essentially meets the accuracy requirements and real-time necessities for autonomous drone flight depth estimation.
作者 赵栓峰 黄涛 许倩 耿龙龙 Zhao Shuanfeng;Huang Tao;Xu Qian;Geng Longlong(College of Mechanical Engineering,Xi′an University of Science and Technology,X'ian,Shaanxi 710054,China)
出处 《激光与光电子学进展》 CSCD 北大核心 2020年第2期137-146,共10页 Laser & Optoelectronics Progress
基金 陕西省自然科学基金(2017JM5029) 西安市科技计划项目(CXY2017079CG/RC042)。
关键词 图像处理 无监督 自编码神经网络 图像重构 单目深度估计 image processing non-supervision neural network of automatic encoder image reconstruction monocular depth estimation
  • 相关文献

参考文献8

二级参考文献33

  • 1Saxena A, Chung S H, Ng A Y. 3-D depth reconstruction from a single still image[J]. International Journal of Computer Vision, 2008, 76(1): 53-69.
  • 2Horn B K P. Obtaining shape from shading information[M]. New York: MIT Press, 1989: 123-171.
  • 3Saxena A, Chung S H, Ng A Y. Learning depth from single monocular images [C]. Advances in Neural Information Processing Systems, 2005: 1161-1168.
  • 4Saxena A, Sun M, Ng A Y. Make 3D: Learning 3D scene structure from a single still image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5): 824-840.
  • 5Saxena A, Schulte J, Ng A Y. Depth estimation using monocular and stereo cues [C] . International Joint Conference on Artificial Intelligence, 2007: 2197-2203.
  • 6Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C] Advances in Neural Information Processing Systems, 2012 : 1106-1114.
  • 7Karpathy A, Toderici G, Shetty S, et ai. Large-scale video classification with convolutional neural networks[C] . IEEE Conference on Computer Vision and Pattern Recognition, 2014: 1725-1732.
  • 8Liang M, Hu X. Recurrent convolutional neural network for object recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3367-3375.
  • 9Lee S C, Nevatia R. Extraction and integration of window in a 3D building model from ground view images [C]. IEEE Computer Conference on Computer Vision and Pattern Recognition, 2004: 113-120.
  • 10Liu L, Yu G, Zokai S, et al. Multiview geometry for texture mapping 2D images onto 3D range data [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2006, 2: 2293-2300.

共引文献63

同被引文献54

引证文献6

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部