We introduce a novel end-to-end deeplearning solution for rapidly estimating a dense spherical depth map of an indoor environment.Our input is a single equirectangular image registered with a sparse depth map,as provi...We introduce a novel end-to-end deeplearning solution for rapidly estimating a dense spherical depth map of an indoor environment.Our input is a single equirectangular image registered with a sparse depth map,as provided by a variety of common capture setups.Depth is inferred by an efficient and lightweight single-branch network,which employs a dynamic gating system to process together dense visual data and sparse geometric data.We exploit the characteristics of typical man-made environments to efficiently compress multiresolution features and find short-and long-range relations among scene parts.Furthermore,we introduce a new augmentation strategy to make the model robust to different types of sparsity,including those generated by various structured light sensors and LiDAR setups.The experimental results demonstrate that our method provides interactive performance and outperforms stateof-the-art solutions in computational efficiency,adaptivity to variable depth sparsity patterns,and prediction accuracy for challenging indoor data,even when trained solely on synthetic data without any fine tuning.展开更多
基金funding from the Autonomous Region of Sardinia under project XDATA.Eva Almansa,Armando Sanchez,Giorgio Vassena,and Enrico Gobbetti received funding from the European Union's H2020 research and innovation programme under grant 813170(EVOCATION).
文摘We introduce a novel end-to-end deeplearning solution for rapidly estimating a dense spherical depth map of an indoor environment.Our input is a single equirectangular image registered with a sparse depth map,as provided by a variety of common capture setups.Depth is inferred by an efficient and lightweight single-branch network,which employs a dynamic gating system to process together dense visual data and sparse geometric data.We exploit the characteristics of typical man-made environments to efficiently compress multiresolution features and find short-and long-range relations among scene parts.Furthermore,we introduce a new augmentation strategy to make the model robust to different types of sparsity,including those generated by various structured light sensors and LiDAR setups.The experimental results demonstrate that our method provides interactive performance and outperforms stateof-the-art solutions in computational efficiency,adaptivity to variable depth sparsity patterns,and prediction accuracy for challenging indoor data,even when trained solely on synthetic data without any fine tuning.