Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion

Runze Liu,Dongchen Zhu,Guanghui Zhang,Yue Xu,Wenjun Shi,Xiaolin Zhang,Lei Wang,Jiamao Li

2024-06-14

Abstract:Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth. In real-world scenarios, the images may be blurry or noisy due to the influence of weather conditions and inherent limitations of the camera. Therefore, it is particularly important to develop a robust depth estimation model. Benefiting from the training strategies of generative networks, generative-based methods often exhibit enhanced robustness. In light of this, we employ a well-converging diffusion model among generative networks for unsupervised monocular depth estimation. Additionally, we propose a hierarchical feature-guided denoising module. This model significantly enriches the model's capacity for learning and interpreting depth distribution by fully leveraging image features to guide the denoising process. Furthermore, we explore the implicit depth within reprojection and design an implicit depth consistency loss. This loss function serves to enhance the performance of the model and ensure the scale consistency of depth within a video sequence. We conduct experiments on the KITTI, Make3D, and our self-collected SIMIT datasets. The results indicate that our approach stands out among generative-based models, while also showcasing remarkable robustness.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address the problem of improving the robustness of models in monocular depth estimation, especially when dealing with images captured under conditions of blur, noise, or adverse weather. Specifically: 1. **Improving Robustness**: Existing discriminative methods perform well on ideally clear images but show poor performance on blurred or noisy images in real-world scenarios. The paper proposes a generative network-based approach to enhance the model's adaptability to different image qualities. 2. **Leveraging the Advantages of Generative Networks**: Generative networks, by learning the joint probability distribution between images and depth, can better understand and interpret the intrinsic depth distribution, thereby exhibiting stronger robustness and adaptability when faced with new data samples. 3. **Proposing a New Framework and Loss Function**: The authors propose an unsupervised monocular depth estimation framework based on a diffusion model and design a Hierarchical Feature Guided Denoising (HFGD) module, as well as an implicit depth consistency loss (Ldc), to enhance the model's performance and ensure consistency in estimated depth within video sequences. In summary, the paper aims to develop an unsupervised monocular depth estimation method that maintains high robustness in complex scenarios.

Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion

Monocular Depth Estimation Based on Unsupervised Learning

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

MonoDiffusion: Self-Supervised Monocular Depth Estimation Using Diffusion Model

Monocular Depth Estimation using Diffusion Models

Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

Unsupervised detail-preserving network for high quality monocular depth estimation

Unsupervised depth estimation from monocular videos with hybrid geometric-refined loss and contextual attention

An Adaptive Unsupervised Learning Framework For Monocular Depth Estimation

MDSNet: self-supervised monocular depth estimation for video sequences using self-attention and threshold mask

Monocular depth estimation with hierarchical fusion of dilated CNNs and soft-weighted-sum inference

Self‐supervised Monocular Depth Estimation Via Asymmetric Convolution Block

Self-supervised Monocular Depth Estimation with Uncertainty-aware Feature Enhancement and Depth Fusion

Semi-Supervised Adversarial Monocular Depth Estimation

Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation

Digging Into Self-Supervised Monocular Depth Estimation

Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions

Self-Supervised Monocular Depth Estimation with Self-Reference Distillation and Disparity Offset Refinement

Unsupervised Monocular Estimation of Depth and Visual Odometry uUsing Attention and Depth-Pose Consistency Loss