DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

Yiqun Duan,Xianda Guo,Zheng Zhu

2023-08-29

Abstract:Monocular depth estimation is a challenging task that predicts the pixel-wise depth from a single 2D image. Current methods typically model this problem as a regression or classification task. We propose DiffusionDepth, a new approach that reformulates monocular depth estimation as a denoising diffusion process. It learns an iterative denoising process to `denoise' random depth distribution into a depth map with the guidance of monocular visual conditions. The process is performed in the latent space encoded by a dedicated depth encoder and decoder. Instead of diffusing ground truth (GT) depth, the model learns to reverse the process of diffusing the refined depth of itself into random depth distribution. This self-diffusion formulation overcomes the difficulty of applying generative models to sparse GT depth scenarios. The proposed approach benefits this task by refining depth estimation step by step, which is superior for generating accurate and highly detailed depth maps. Experimental results on KITTI and NYU-Depth-V2 datasets suggest that a simple yet efficient diffusion approach could reach state-of-the-art performance in both indoor and outdoor scenarios with acceptable inference time.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to tackle the challenging task of Monocular Depth Estimation. Specifically, the authors propose a novel method called **DiffusionDepth**, which redefines monocular depth estimation as a denoising diffusion process. This method generates detailed depth maps by iteratively "denoising" random depth distributions under the guidance of visual conditions. #### Main Contributions: 1. **New Method Framework**: Redefines the monocular depth estimation problem as an iterative diffusion denoising problem, utilizing visual guidance conditions. 2. **Outstanding Performance**: Achieves state-of-the-art (SOTA) performance on the KITTI and NYU-Depth-V2 datasets with acceptable inference time. 3. **First Introduction of Diffusion Models**: This is the first time diffusion models have been applied to monocular depth estimation, providing detailed component analysis and valuable insights applicable to potentially related 3D vision tasks. #### Key Technical Points: - **Self-Diffusion Process**: To address the issue of mode collapse in generative models under sparse ground truth depth (Sparse GT Depth) conditions, DiffusionDepth introduces a self-diffusion process. - **Denoising Block Design**: Implements the iterative denoising process through the Monocular Conditioned Denoising Block (MCDB), gradually refining the depth map. Through these innovations, DiffusionDepth not only performs excellently in experiments but also brings new research perspectives and technological breakthroughs to the field of monocular depth estimation.

DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

Monocular Depth Estimation using Diffusion Models

Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion

Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions

Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation

The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage

PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation

Depth Estimation from Monocular Images Using Dilated Convolution and Uncertainty Learning.

NDDepth: Normal-Distance Assisted Monocular Depth Estimation

DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

Digging into contrastive learning for robust depth estimation with diffusion models

Monocular Depth Estimation with Guidance of Surface Normal Map

DepthFM: Fast Monocular Depth Estimation with Flow Matching

ADU-Depth: Attention-based Distillation with Uncertainty Modeling for Depth Estimation

DDP: Diffusion Model for Dense Visual Prediction

EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation