摘要
场景深度估计是场景理解的一项基本任务,其准确率反映了计算机对场景的理解程度。单目深度估计任务本身是一个不适定问题,因此在很大程度上依赖于场景的先验知识和其他辅助信息,语义信息能够有效地帮助深度估计更好地进行预测。针对单目深度估计任务的特有问题,提出了一种基于融合语义特征的深度神经网络模型,通过像素自适应卷积将目标图像的语义信息融合到深度网络,以提高深度估计的准确性。为了充分利用多尺度图像特征,引入DenseNet模型的基础模块,自适应融合各尺度的有效特征。在NYU-DepthV2室内场景数据集的实验结果显示,验证了模型和方法的有效性,提出的方法在定性和定量评价方面都取得了具有竞争力的结果。
Scene depth estimation is a basic task in scene understanding,and its accuracy reflects the computer’s understanding of the scene.Since monocular depth estimation is an ill-posed problem,it largely depends on the prior knowledge of the scene and other auxiliary information,and semantic information can effectively help depth estimation to better predict.Aiming at the unique problem of the task of estimating mono depth,the deep neural network model is proposed based on semantic fusion features.The semantic information of target image is fused into the deep network through pixel adaptive convolution to improve the accuracy of depth estimation.In order to make full use of multi-scale image features,the basic module of DenseNet model is introduced to adaptively fuse the effective features of each scale.Experimental results on NYU-DepthV2 indoor scene data set verify the validity of the model and method,and the proposed method achieves competitive results in both qualitative and quantitative evaluation.
作者
仵宇
WU Yu(College of Computer Science and Technology,China University of Petroleum,Qingdao 266580)
出处
《计算机与数字工程》
2022年第6期1263-1267,共5页
Computer & Digital Engineering
关键词
深度估计
语义信息
像素自适应卷积
自适应特征融合
depth estimation
semantic information
pixel adaptive convolution
adaptive feature fusion