摘要
随着深度学习的发展,无监督单目深度估计成为计算机视觉的研究热点。由于深度图存在轮廓不清晰、深度估计不准确等问题,以编—解码器结构为基础,提出一种基于语义信息的无监督单目深度估计网络,为了获取更为清晰的轮廓信息,本文在编解码器之间通过空洞空间卷积池化金字塔(ASPP)层进行语义信息的细化,提高生成的图像质量;该网络通过在编码器到解码器的跳层连接实现对多分辨率特征的提取,在编码器部分采用改进的高分辨率网络(HRNet)融合不同层的多分辨率特征,在解码前使用串联策略融合中间阶段的输出,提高深度估计的准确率。在KITTI数据集上的实验结果表明,本文方法的误差评价指标相较于目前的深度估计方法更低,在3个深度估计准确率评价指标上达到了89.4%,96.3%,98.1%,具有较好的准确性。
With the development of deep learning,unsupervised monocular depth estimation has become a research hot spot in computer vision.However,there is a serious problem,such as unclear outline of depth image and inaccurate depth estimation.In view of the above problems,based on the encoder-decoder architecture,an unsupervised monocular depth estimation network based on semantic information is proposed.In order to obtain clearer contour information,the semantic information is refined through the atrous spatial pyramid pooling(ASPP)layer between the encoder-decoder architecture to improve quality of the generated images.The network realizes the extraction of multi-resolution features through the skip layer connection from the encoder to the decoder.In the encoder part,an improved high-resolution network(HRNet)is used to fuse the multi-resolution features of different layers,and uses a concatenation strategy to fuse the outputs of the intermediate stages before decoding to improve the accuracy of depth estimation.The experimental results on the KITTI dataset show that the error evaluation index of the proposed method is lower than the current unsupervised monocular depth estimation method,reaching 89.4%,96.3%and 98.1%on the three accuracy evaluation indexes,which has good accuracy.
作者
李颀
李煜哲
LI Qi;LI Yuzhe(College of Electronic Information and Artificial Intelligence,Shaanxi University of Science and Technology,Xi’an 710021,China)
出处
《传感器与微系统》
CSCD
北大核心
2024年第9期157-160,共4页
Transducer and Microsystem Technologies
基金
西安市科技计划资助项目(201806117YF05NC13(1))
西安市未央区科技计划资助项目(201305)
陕西科技大学博士科研启动基金资助项目(BJ13-15)。
关键词
深度估计
无监督学习
多分辨率特征
语义信息
编—解码结构
depth estimation
unsupervised learning
multi-resolution features
semantic information
encoder-decoder architecture