基于深度学习的自监督单目动态场景深度估计综述

Self-supervised monocular depth estimation in dynamic scenes based on deep learning

导出

摘要现实世界中不存在完全静态的场景,动态场景下的单目深度估计方法是指从单幅影像中同时获取动态前景和静态背景的深度信息,与传统双目估计方法相比具有运用灵活、成本较低等优势,有着极强的研究意义和广阔的发展前景,在三维重建、自动驾驶等下游任务中起着关键作用。深度学习技术迅速发展,自监督学习不使用真实数据标签,吸引众多学者的研究热情。国内外众多学者为了处理场景中的动态物体相继提出一系列自监督单目深度估计算法,为广大相关领域的研究者奠定了研究基础,但目前尚未有对上述方法进行综合分析的研究。针对这一问题,本文对基于深度学习的自监督单目动态场景深度估计技术进展情况进行了系统性梳理与总结,首先归纳了基于深度学习的自监督单目深度估计的基本模型,分析了动态物体是如何对场景深度估计产生的影响;其次,介绍了单目深度估计研究的常用数据集以及评价指标,对经典动态场景下单目深度估计模型进行了性能对比分析;然后,依据对动态物体的处理方式不同,分别从动态场景鲁棒深度估计和动态物体跟踪与深度估计两个研究方向,进行了总结与定量分析;最后对动态场景单目深度估计的未来发展方向进行了展望。 In the real world,completely static scenes do not exist.Monocular depth estimation in dynamic scenes refers to obtaining depth information of dynamic foreground and static background from a single image,which has advantages over traditional stereo estimation methods in terms of flexibility and cost-effectiveness.It has strong research relevance and broad development prospects,playing a key role in downstream tasks,such as 3D reconstruction and autonomous driving.With the rapid development of deep learning technology selfsupervised learning without using real data labels has attracted the enthusiasm of many scholars.Many local and foreign scholars have proposed a series of self-supervised monocular depth estimation algorithms to deal with dynamic objects in scenes,laying the research foundation for researchers in related fields.However,a comprehensive analysis of the above methods has yet to be conducted.To address this issue,this study systematically reviews and summarizes the progress of self-supervised monocular depth estimation in dynamic scenes based on deep learning.First,the basic models of self-supervised monocular depth estimation based on deep learning are summarized,and how self-supervised constraints are applied between images is analyzed and explained.Moreover,a basic framework diagram of self-supervised monocular depth estimation based on continuous frames is drawn.The effect of dynamic objects on images is explained from four aspects:epipolar lines,triangulation,fundamental matrix estimation,and reprojection error.Second,commonly used datasets and evaluation metrics for monocular depth estimation research are introduced.The KITTI and Cityscapes datasets provide continuous outdoor image data,while the NYU Depth V2 dataset provides indoor dynamic scene data,which are generally used for model training.The Make3D dataset has depth data but discontinuous images,which are generally used to test the generalization ability of the model.The algorithms are quantitatively analyzed using Root Mean Square Error(RMSE),logarithmic root mean square error(RMSE log),absolute relative error(Abs Rel),squared relative error(Sq Rel),and accuracies(Acc),and the performance of classic monocular depth estimation models in dynamic scenes is compared and analyzed.Then,on the basis of different ways of handling dynamic objects,the research directions of robust depth estimation in dynamic scenes and dynamic object tracking and depth estimation are summarized and analyzed.Dynamic objects are extracted and treated as outliers during training model to minimize their effect,training solely on static background information,which is referred to as robust depth estimation in dynamic scenes.Accurately distinguishing dynamic foreground and static background and processing the two regions separately is referred to as dynamic object tracking and depth estimation.Various algorithms for detecting and segmenting dynamic objects based on optical flow information,semantic information,and other information while estimating their motion are explained.At the same time,the advantages and disadvantages of each type of algorithm are summarized and analyzed on the basis of commonly used evaluation criteria.Finally,the future development directions of monocular depth estimation in dynamic scenes are discussed from the aspects of network model optimization,online learning and generalization,real-time operation capability of embedded devices,and domain adaptation of selfsupervised learning.

作者程彬彬于英张磊王自全江志鹏 CHENG Binbin;YU Ying;ZHANG Lei;WANG Ziquan;JIANG Zhipeng(Information Engineering University,Institute of Geospatial Information,Zhengzhou 450001,China)

机构地区信息工程大学地理空间信息学院

出处《遥感学报》 EI CSCD 北大核心 2024年第9期2170-2186,共17页 NATIONAL REMOTE SENSING BULLETIN

基金国家自然科学基金(编号:42071340) 嵩山实验室项目(纳入河南省重大科技专项管理体系)(编号:221100211000-01)。

关键词遥感动态场景单目深度估计自监督学习深度学习三维重建 remote sensing dynamic scenes monocular depth estimation self-supervised learning deep learning 3D reconstruction

分类号 TP751.1 [自动化与计算机技术—检测技术与自动化装置] P2 [天文地球—测绘科学与技术]

引文网络
相关文献

参考文献5

1高兴波,史旭华,葛群峰,陈奎烨.面向动态物体场景的视觉SLAM综述[J].机器人,2021,43(6):733-750. 被引量：31
2ZHAO ChaoQiang,SUN QiYu,ZHANG ChongZhen,TANG Yang,QIAN Feng.Monocular depth estimation based on deep learning: An overview[J].Science China(Technological Sciences),2020,63(9):1612-1627. 被引量：24
3李玉美,郭庆华,万波,秦宏楠,王德智,徐可心,宋师琳,孙千惠,赵晓霞,杨默含,吴晓永,魏邓杰,胡天宇,苏艳军.基于激光雷达的自然资源三维动态监测现状与展望[J].遥感学报,2021,25(1):381-402. 被引量：28
4康金忠,王桂周,何国金,王慧慧,尹然宇,江威,张兆明.遥感视频卫星运动车辆目标快速检测[J].遥感学报,2020,24(9):1099-1107. 被引量：8
5江俊君,李震宇,刘贤明.基于深度学习的单目深度估计方法综述[J].计算机学报,2022,45(6):1276-1307. 被引量：19

二级参考文献56

1陈宜瑜,吕宪国.湿地功能与湿地科学的研究方向[J].湿地科学,2003,1(1):7-11. 被引量：239
2黄麟,张晓丽.三维成像激光雷达遥感技术在林业中的应用[J].世界林业研究,2006,19(4):11-17. 被引量：12
3金翔龙.海洋地球物理研究与海底探测声学技术的发展[J].地球物理学进展,2007,22(4):1243-1249. 被引量：91
4张进德,田磊,赵慧.我国矿山地质环境监测工作方法初探[J].水文地质工程地质,2008,35(2). 被引量：29
5李文芳,孔锐,王仁财.我国重要矿产资源评价指标体系研究[J].中国国土资源经济,2008,21(7):26-28. 被引量：18
6韦玮,李增元,谭炳香.高光谱遥感技术在湿地研究中的应用[J].世界林业研究,2010,23(3):18-23. 被引量：13
7张晓浩,娄全胜,张春雨.基于机载激光雷达的海岸带三维景观仿真模拟[J].热带海洋学报,2010,29(5):44-48. 被引量：2
8张达,郑玉权.高光谱遥感的发展与应用[J].光学与光电技术,2013,11(3):67-73. 被引量：49
9张树文,颜凤芹,于灵雪,卜坤,杨久春,常丽萍.湿地遥感研究进展[J].地理科学,2013,33(11):1406-1412. 被引量：75
10郭庆华,刘瑾,陶胜利,薛宝林,李乐,徐光彩,李文楷,吴芳芳,李玉美,陈琳海,庞树鑫.激光雷达在森林生态系统监测模拟中的应用现状与展望[J].科学通报,2014,59(6):459-478. 被引量：84

共引文献103

1陈杰,张志林,汤奕琛,陈晨,许杰.无人机激光点云和多光谱数据的融合技术研究[J].中国新技术新产品,2021(14):5-7.
2程晓晖,李长辉,欧佳斌,刘业光.深度学习在海珠国家湿地公园统一确权调查中的应用[J].测绘通报,2021(11):110-114. 被引量：1
3孙琦钰,赵超强,唐漾,钱锋.基于无监督域自适应的计算机视觉任务研究进展[J].中国科学：技术科学,2022,52(1):26-54. 被引量：13
4王成军,韦志文,严晨.基于机器视觉技术的分拣机器人研究综述[J].科学技术与工程,2022,22(3):893-902. 被引量：44
5万鑫兴,姚自魁,韩立刚,芮立全.基于人工智能技术的客运分流引导机器人的研究[J].中国新技术新产品,2021(24):16-19. 被引量：1
6宋巍,朱孟飞,张明华,赵丹枫,贺琪.基于深度学习的单目深度估计技术综述[J].中国图象图形学报,2022,27(2):292-328. 被引量：8
7罗会兰,周逸风.深度学习单目深度估计研究进展[J].中国图象图形学报,2022,27(2):390-403. 被引量：5
8李道京,高敬涵,崔岸婧,周凯,吴疆.2m衍射口径星载双波长陆海激光雷达系统研究[J].中国激光,2022,49(3):117-128. 被引量：17
9刘斌,李港庆,安澄全,王水根,王建生.基于多尺度特征融合的红外单目测距算法[J].计算机应用,2022,42(3):804-809. 被引量：8
10刘春,贾守军,吴杭彬,黄炜,郑宁,艾克然木·艾克拜尔.点云场景认知模式——泛化点云[J].测绘学报,2022,51(4):556-567. 被引量：6

1包致鹏.平面几何最值问题的几种模型及其解题策略[J].数理化解题研究,2024(29):2-4.
2张子健,伍吉仓,张磊,厉彦一.结合激光点云与影像的LoD3建筑物窗口自动建模[J].同济大学学报（自然科学版）,2024,52(9):1474-1482.
3杨锐.结合MobileNet的改进DeepLabv3+遥感影像道路提取方法[J].北京测绘,2024,38(8):1218-1223.
4王智敏,沈建山.夹板式隔热型防火玻璃的性能对比分析[J].玻璃,2024,51(10):33-37.
5徐雯清,顾大德,刘有志,张余平.基于自适应金豺狼优化算法的巡检机器人路径规划[J].核电子学与探测技术,2024,44(5):955-962.
6林翠翠.基于HIS色彩空间的可见光影像色彩平衡方法[J].北京测绘,2023,37(8):1085-1089.
7刘继玲.戴明回归和ROI细化的机械仪表读数技术[J].中国仪器仪表,2024(7):71-74.
8魏巍.2种钢管混凝土格构式风电塔架节点受力性能对比分析[J].科技创新与应用,2024,14(31):54-58.
9韦金梅.单元整体视角下“数的运算”教学思考与实践——以“多位数乘一位数”教学为例[J].广西教育,2024(25):53-58.
10顾祎楠,郭海博.辽西走廊地区乡村住宅组团风环境模拟及优化设计[J].建筑与文化,2024(10):65-68.

遥感学报

2024年第9期

浏览历史

内容加载中请稍等...

基于深度学习的自监督单目动态场景深度估计综述

参考文献5

二级参考文献56

共引文献103

相关作者

相关机构

相关主题

浏览历史