Abstract:Depth estimation from a single image is an active research topic in computer vision. The most accurate approaches are based on fully supervised learning models, which rely on a large amount of dense and high-resolution (HR) ground-truth depth maps. However, in practice, color images are usually captured with much higher resolution than depth maps, leading to the resolution-mismatched effect. In this paper, we propose a novel weakly-supervised framework to train a monocular depth estimation network to generate HR depth maps with resolution-mismatched supervision, i.e., the inputs are HR color images and the ground-truth are low-resolution (LR) depth maps. The proposed weakly supervised framework is composed of a sharing weight monocular depth estimation network and a depth reconstruction network for distillation. Specifically, for the monocular depth estimation network the input color image is first downsampled to obtain its LR version with the same resolution as the ground-truth depth. Then, both HR and LR color images are fed into the proposed monocular depth estimation network to obtain the corresponding estimated depth maps. We introduce three losses to train the network: 1) reconstruction loss between the estimated LR depth and the ground-truth LR depth; 2) reconstruction loss between the downsampled estimated HR depth and the ground-truth LR depth; 3) consistency loss between the estimated LR depth and the downsampled estimated HR depth. In addition, we design a depth reconstruction network from depth to depth. Through distillation loss, features between two networks maintain the structural consistency in affinity space, and finally improving the estimation network performance. Experimental results demonstrate that our method achieves superior performance than unsupervised and semi-supervised learning based schemes, and is competitive or even better compared to supervised ones.

Unsupervised detail-preserving network for high quality monocular depth estimation

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Monocular Depth Estimation Based on Unsupervised Learning

A Robust Monocular Depth Estimation Framework Based on Light-Weight ERF-Pspnet for Day-Night Driving Scenes

Unsupervised High-Resolution Depth Learning from Videos with Dual Networks

High Quality Monocular Depth Estimation Via A Multi-Scale Network And A Detail-Preserving Objective

Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions

Depth Estimation from Monocular Images Using Dilated Convolution and Uncertainty Learning.

Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion

Efficient Edge-Preserving Multi-View Stereo Network for Depth Estimation

Unsupervised Monocular Depth Estimation with Encoder-decoder Network

LapUNet: a novel approach to monocular depth estimation using dynamic laplacian residual U-shape networks

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised Monocular Depth Estimation

Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched Data

Unsupervised Monocular Estimation of Depth and Visual Odometry uUsing Attention and Depth-Pose Consistency Loss

Self-supervised monocular depth estimation based on image texture detail enhancement

Monocular Depth Estimation With Affinity, Vertical Pooling, And Label Enhancement

SAU-Net: Monocular Depth Estimation Combining Multi-Scale Features and Attention Mechanisms

Unsupervised Monocular Depth Perception: Focusing on Moving Objects

DCU-NET: Self-supervised Monocular Depth Estimation Based on Densely Connected U-shaped Convolutional Neural Networks.