Abstract:Depth estimation from a single image is an active research topic in computer vision. The most accurate approaches are based on fully supervised learning models, which rely on a large amount of dense and high-resolution (HR) ground-truth depth maps. However, in practice, color images are usually captured with much higher resolution than depth maps, leading to the resolution-mismatched effect. In this paper, we propose a novel weakly-supervised framework to train a monocular depth estimation network to generate HR depth maps with resolution-mismatched supervision, i.e., the inputs are HR color images and the ground-truth are low-resolution (LR) depth maps. The proposed weakly supervised framework is composed of a sharing weight monocular depth estimation network and a depth reconstruction network for distillation. Specifically, for the monocular depth estimation network the input color image is first downsampled to obtain its LR version with the same resolution as the ground-truth depth. Then, both HR and LR color images are fed into the proposed monocular depth estimation network to obtain the corresponding estimated depth maps. We introduce three losses to train the network: 1) reconstruction loss between the estimated LR depth and the ground-truth LR depth; 2) reconstruction loss between the downsampled estimated HR depth and the ground-truth LR depth; 3) consistency loss between the estimated LR depth and the downsampled estimated HR depth. In addition, we design a depth reconstruction network from depth to depth. Through distillation loss, features between two networks maintain the structural consistency in affinity space, and finally improving the estimation network performance. Experimental results demonstrate that our method achieves superior performance than unsupervised and semi-supervised learning based schemes, and is competitive or even better compared to supervised ones.

Self-supervised monocular depth estimation on construction sites in low-light conditions and dynamic scenes

Self-supervised Monocular Depth Estimation for All Day Images Using Domain Separation

Monocular Depth Estimation Based on Unsupervised Learning

A Robust Monocular Depth Estimation Framework Based on Light-Weight ERF-Pspnet for Day-Night Driving Scenes

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

A self‐supervised monocular depth estimation model with scale recovery and transfer learning for construction scene analysis

A Lightweight Self-Supervised Training Framework for Monocular Depth Estimation

Self-Supervised Monocular Depth Estimation in the Dark: Towards Data Distribution Compensation

Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Self-supervised Depth Estimation Leveraging Global Perception and Geometric Smoothness Using On-board Videos

Resolution-sensitive self-supervised monocular absolute depth estimation

MBUDepthNet: Real-Time Unsupervised Monocular Depth Estimation Method for Outdoor Scenes

SRNSD: Structure-Regularized Night-Time Self-Supervised Monocular Depth Estimation for Outdoor Scenes

MDSNet: self-supervised monocular depth estimation for video sequences using self-attention and threshold mask

Digging Into Self-Supervised Monocular Depth Estimation

Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched Data

Deeper into Self-Supervised Monocular Indoor Depth Estimation

Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning

Aligning Technology and HRD strategy: A practical Framework to support technology transfer

Self-Supervised Monocular Depth Estimation With Multiscale Perception