DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

Yiqun Duan,Xianda Guo,Zheng Zhu
2023-08-29
Abstract:Monocular depth estimation is a challenging task that predicts the pixel-wise depth from a single 2D image. Current methods typically model this problem as a regression or classification task. We propose DiffusionDepth, a new approach that reformulates monocular depth estimation as a denoising diffusion process. It learns an iterative denoising process to `denoise' random depth distribution into a depth map with the guidance of monocular visual conditions. The process is performed in the latent space encoded by a dedicated depth encoder and decoder. Instead of diffusing ground truth (GT) depth, the model learns to reverse the process of diffusing the refined depth of itself into random depth distribution. This self-diffusion formulation overcomes the difficulty of applying generative models to sparse GT depth scenarios. The proposed approach benefits this task by refining depth estimation step by step, which is superior for generating accurate and highly detailed depth maps. Experimental results on KITTI and NYU-Depth-V2 datasets suggest that a simple yet efficient diffusion approach could reach state-of-the-art performance in both indoor and outdoor scenarios with acceptable inference time.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to tackle the challenging task of Monocular Depth Estimation. Specifically, the authors propose a novel method called **DiffusionDepth**, which redefines monocular depth estimation as a denoising diffusion process. This method generates detailed depth maps by iteratively "denoising" random depth distributions under the guidance of visual conditions. #### Main Contributions: 1. **New Method Framework**: Redefines the monocular depth estimation problem as an iterative diffusion denoising problem, utilizing visual guidance conditions. 2. **Outstanding Performance**: Achieves state-of-the-art (SOTA) performance on the KITTI and NYU-Depth-V2 datasets with acceptable inference time. 3. **First Introduction of Diffusion Models**: This is the first time diffusion models have been applied to monocular depth estimation, providing detailed component analysis and valuable insights applicable to potentially related 3D vision tasks. #### Key Technical Points: - **Self-Diffusion Process**: To address the issue of mode collapse in generative models under sparse ground truth depth (Sparse GT Depth) conditions, DiffusionDepth introduces a self-diffusion process. - **Denoising Block Design**: Implements the iterative denoising process through the Monocular Conditioned Denoising Block (MCDB), gradually refining the depth map. Through these innovations, DiffusionDepth not only performs excellently in experiments but also brings new research perspectives and technological breakthroughs to the field of monocular depth estimation.