Self-supervised Monocular Depth Estimation with Self-Distillation and Dense Skip Connection

Xuezhi Xiang,Wei Li,Yao Wang,Abdulmotaleb El Saddik
DOI: https://doi.org/10.1016/j.cviu.2024.104048
IF: 4.886
2024-01-01
Computer Vision and Image Understanding
Abstract:Monocular depth estimation (MDE) is crucial in a wide range of applications, including robotics, autonomous driving and virtual reality. Self-supervised monocular depth estimation has emerged as a promising MDE approach without requiring hard-to-obtain depth labels during training, and multi-scale photometric loss is widely used for self-supervised monocular depth estimation as the self-supervised signal. However, multi-photometric loss is a weak training signal and might disturb the good intermediate features representation. In this paper, we propose a successive depth map self-distillation(SDM-SD) loss, which combines with the single-scale photometric loss to replace the multi-scale photometric loss. Moreover, considering that multi-stage feature representations are essential for dense prediction tasks such as depth estimation, we also propose a dense skip connection, which can efficiently fuse the intermediate features of the encoder and fully utilize them in each stage of the decoder in our encoder–decoder architecture. By applying successive depth map self-distillation loss and dense skip connection, our proposed method can achieve state-of-the-art performance on the KITTI benchmark, and exhibit the best generalization ability on the challenging indoor dataset NYUv2 dataset.
What problem does this paper attempt to address?