摘要
Without the dependence of depth ground truth,self‐supervised learning is a promising alternative to train monocular depth estimation.It builds its own supervision signal with the help of other tools,such as view synthesis and pose networks.However,more training parameters and time consumption may be involved.This paper proposes a monocular depth prediction framework that can jointly learn the depth value and pose transformation between images in an end‐to‐end manner.The depth network creatively employs an asymmetric convolution block instead of every square kernel layer to strengthen the learning ability of extracting image features when training.During infer-ence time,the asymmetric kernels are fused and converted to the original network to predict more accurate image depth,thus bringing no extra computations anymore.The network is trained and tested on the KITTI monocular dataset.The evaluated results demonstrate that the depth model outperforms some State of the Arts(SOTA)ap-proaches and can reduce the inference time of depth prediction.Additionally,the pro-posed model performs great adaptability on the Make3D dataset.
基金
Natural Science Foundation of Shanghai,Grant/Award Number:61922063
National Key R&D Program of China,Grant/Award Number:2018YFB1305003
Fundamental Research Funds for the Central Universities
Shanghai Hong Kong Macao Taiwan Science and Technology Cooperation Project,Grant/Award Number:21550760900
Shanghai Municipal Science and Technology Major Project,Grant/Award Number:2021SHZDZX0100。