Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions

Fabio Tosi,Pierluigi Zama Ramirez,Matteo Poggi
2024-07-24
Abstract:We present a novel approach designed to address the complexities posed by challenging, out-of-distribution data in the single-image depth estimation task. Starting with images that facilitate depth prediction due to the absence of unfavorable factors, we systematically generate new, user-defined scenes with a comprehensive set of challenges and associated depth information. This is achieved by leveraging cutting-edge text-to-image diffusion models with depth-aware control, known for synthesizing high-quality image content from textual prompts while preserving the coherence of 3D structure between generated and source imagery. Subsequent fine-tuning of any monocular depth network is carried out through a self-distillation protocol that takes into account images generated using our strategy and its own depth predictions on simple, unchallenging scenes. Experiments on benchmarks tailored for our purposes demonstrate the effectiveness and versatility of our proposal.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the challenges of monocular depth estimation under challenging conditions, particularly in adverse weather (such as rain, snow, night) and non-Lambertian surfaces (such as transparent or specular objects). The authors propose an innovative approach that utilizes a diffusion model to generate new scenes with depth information and fine-tunes existing monocular depth networks through a self-distillation protocol. This method enables the model to perform more robustly on complex and out-of-distribution data. Specifically, the main contributions of the paper are as follows: 1. **Innovative use of diffusion model**: For the first time, a diffusion model is applied to the task of monocular depth estimation to tackle the challenges posed by adverse weather conditions and non-Lambertian surfaces. 2. **Improving the robustness of existing models**: By distilling the knowledge of the diffusion model, the performance of existing monocular depth estimation models under out-of-distribution conditions is improved. 3. **Unified handling of multiple challenges**: This method can simultaneously address various adverse conditions, such as harsh weather and non-Lambertian surfaces, demonstrating competitive results compared to specialized solutions. The experimental section showcases the effectiveness on multiple datasets, including autonomous driving datasets (such as nuScenes and RobotCar) and non-Lambertian surface datasets (such as Booster and ClearGrasp). The results show that the fine-tuned model has significant improvements under various conditions, especially in challenging environments like nighttime and rainy weather.