Abstract:Existing methods for scale-invariant monocular depth estimation (SI MDE) often struggle due to the complexity of the task, and limited and non-diverse datasets, hindering generalizability in real-world scenarios. This is while shift-and-scale-invariant (SSI) depth estimation, simplifying the task and enabling training with abundant stereo datasets achieves high performance. We present a novel approach that leverages SSI inputs to enhance SI depth estimation, streamlining the network's role and facilitating in-the-wild generalization for SI depth estimation while only using a synthetic dataset for training. Emphasizing the generation of high-resolution details, we introduce a novel sparse ordinal loss that substantially improves detail generation in SSI MDE, addressing critical limitations in existing approaches. Through in-the-wild qualitative examples and zero-shot evaluation we substantiate the practical utility of our approach in computational photography applications, showcasing its ability to generate highly detailed SI depth maps and achieve generalization in diverse scenarios.

What problem does this paper attempt to address?

This paper attempts to address the problem of achieving high-resolution scale-invariant monocular depth estimation (SI MDE) in complex outdoor scenes. Specifically, existing methods face the following challenges when handling this task: 1. **Dataset Limitations**: Existing scale-invariant monocular depth estimation methods struggle to achieve the boundary accuracy and generalization required for photographic applications due to the lack of high-resolution, large-scale, and diverse training datasets. 2. **Detail Generation**: Existing methods are insufficient in generating high-resolution details, especially in complex scenes. 3. **Geometric Accuracy**: Although scale and shift-invariant (SSI) depth estimation excels in generating high-resolution details, its geometric accuracy is lacking, making it unsuitable for computer graphics applications. To address these issues, the authors propose a new method that leverages rich stereo datasets to enhance the performance of scale-invariant monocular depth estimation. The specific steps are as follows: 1. **Initial SSI Depth Estimation**: First, use low-resolution SSI depth estimation to capture the overall structure of the scene. 2. **High-Resolution SSI Depth Estimation**: Then, use high-resolution SSI depth estimation to capture fine depth discontinuities. 3. **Information Fusion**: Input this rich structural information into a scale-invariant depth estimation network to regress high-resolution scale-invariant monocular depth. To improve the performance of SSI depth estimation, the authors introduce a new sparse ordinal loss, which significantly enhances detail generation and boundary accuracy. In this way, the authors' method can generate highly detailed scale-invariant depth maps in various scenes with good generalization ability. In summary, this paper aims to address the shortcomings of existing methods in detail generation and generalization in high-resolution, complex scenes by combining the advantages of SSI depth estimation, thereby achieving high-quality depth estimation for applications such as computational photography.

Scale-Invariant Monocular Depth Estimation via SSI Depth

Self-Supervised Monocular Depth Estimation With Multiscale Perception

Towards Zero-Shot Scale-Aware Monocular Depth Estimation

Monocular Depth Estimation via Self-Supervised Self-Distillation

ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation

Detaching and Boosting: Dual Engine for Scale-Invariant Self-Supervised Monocular Depth Estimation

Unsupervised Scale-Consistent Depth Learning from Video

DELTAS: Depth Estimation by Learning Triangulation And densification of Sparse points

Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics

RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation

Resolution-sensitive self-supervised monocular absolute depth estimation

Monocular Depth Estimation Using Cues Inspired by Biological Vision Systems

Self-Supervised Monocular Depth Estimation With Positional Shift Depth Variance and Adaptive Disparity Quantization

MSFNet:Multi-scale features network for monocular depth estimation

Towards Scale-Aware Self-Supervised Multi-Frame Depth Estimation with IMU Motion Dynamics.

RealMonoDepth: Self-Supervised Monocular Depth Estimation for General Scenes

Self-supervised learning monocular depth estimation from internet photos

Depth Estimation from Multi-Scale SLIC Superpixels Using Non-Parametric Learning

Addressing the Scale Shrinkage Problem in Learning-based Binocular Depth Estimation

SPIdepth: Strengthened Pose Information for Self-supervised Monocular Depth Estimation

FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen Indoor Scene