Abstract:Depth estimation from a single image is an active research topic in computer vision. The most accurate approaches are based on fully supervised learning models, which rely on a large amount of dense and high-resolution (HR) ground-truth depth maps. However, in practice, color images are usually captured with much higher resolution than depth maps, leading to the resolution-mismatched effect. In this paper, we propose a novel weakly-supervised framework to train a monocular depth estimation network to generate HR depth maps with resolution-mismatched supervision, i.e., the inputs are HR color images and the ground-truth are low-resolution (LR) depth maps. The proposed weakly supervised framework is composed of a sharing weight monocular depth estimation network and a depth reconstruction network for distillation. Specifically, for the monocular depth estimation network the input color image is first downsampled to obtain its LR version with the same resolution as the ground-truth depth. Then, both HR and LR color images are fed into the proposed monocular depth estimation network to obtain the corresponding estimated depth maps. We introduce three losses to train the network: 1) reconstruction loss between the estimated LR depth and the ground-truth LR depth; 2) reconstruction loss between the downsampled estimated HR depth and the ground-truth LR depth; 3) consistency loss between the estimated LR depth and the downsampled estimated HR depth. In addition, we design a depth reconstruction network from depth to depth. Through distillation loss, features between two networks maintain the structural consistency in affinity space, and finally improving the estimation network performance. Experimental results demonstrate that our method achieves superior performance than unsupervised and semi-supervised learning based schemes, and is competitive or even better compared to supervised ones.

Toward Better SSIM Loss for Unsupervised Monocular Depth Estimation

Monocular Depth Estimation Based on Unsupervised Learning

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Unsupervised monocular depth learning using self-teaching and contrast-enhanced SSIM loss

Towards Loss Balance and Consistent Model in Self-supervised Monocular Depth Estimation

Rethinking Training Objective for Self-Supervised Monocular Depth Estimation - Semantic Cues to Rescue.

Self-Supervised Monocular Depth Estimation: Solving the Edge-Fattening Problem

Digging Into Self-Supervised Monocular Depth Estimation

Self-Supervised Monocular Depth Estimation With Multiscale Perception

Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation

Self-Supervised Monocular Depth Estimation with Self-Reference Distillation and Disparity Offset Refinement

Adaptive Semantic Fusion Framework for Unsupervised Monocular Depth Estimation

Unsupervised Monocular Depth Perception: Focusing on Moving Objects

Deeper into Self-Supervised Monocular Indoor Depth Estimation

Unsupervised Monocular Estimation of Depth and Visual Odometry uUsing Attention and Depth-Pose Consistency Loss

Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions

Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched Data

FA-Depth: Toward Fast and Accurate Self-supervised Monocular Depth Estimation

Self-Supervised Learning based Depth Estimation from Monocular Images

CbwLoss: Constrained Bidirectional Weighted Loss for Self-Supervised Learning of Depth and Pose