Abstract:Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-posedness of the single-image reconstruction problem, most well-established methods are built upon multi-view geometry. State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity. Meanwhile, SOTA monocular methods trained on large mixed datasets achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. In this work, we show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models. Equipped with our module, monocular models can be stably trained with over 8 million images with thousands of camera models, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Experiments demonstrate SOTA performance of our method on 7 zero-shot benchmarks. Notably, our method won the championship in the 2nd Monocular Depth Estimation Challenge. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. The potential benefits extend to downstream tasks, which can be significantly improved by simply plugging in our model. For example, our model relieves the scale drift issues of monocular-SLAM (Fig. 1), leading to high-quality metric scale dense mapping. The code is available at <a class="link-external link-https" href="https://github.com/YvanYin/Metric3D" rel="external noopener nofollow">this https URL</a>.

MetricDepth: Enhancing Monocular Depth Estimation with Deep Metric Learning

Depth-discriminative Metric Learning for Monocular 3D Object Detection

Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

Introducing a Class-Aware Metric for Monocular Depth Estimation: An Automotive Perspective

UniDepth: Universal Monocular Metric Depth Estimation

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

Lifelong-MonoDepth: Lifelong Learning for Multi-Domain Monocular Metric Depth Estimation

Lifelong-MonoDepth: Lifelong Learning for Multidomain Monocular Metric Depth Estimation

Depth from Defocus Via Discriminative Metric Learning

ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation

SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation

DME: Unveiling the Bias for Better Generalized Monocular Depth Estimation

DPDFormer: A Coarse-to-Fine Model for Monocular Depth Estimation

DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

Depth Is All You Need for Monocular 3D Detection

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

Spatiotemporally Enhanced Photometric Loss for Self-Supervised Monocular Depth Estimation.

MonoCD: Monocular 3D Object Detection with Complementary Depths

Deep Localized Metric Learning.

Deep Metric Learning with Dynamic Margin Hard Sampling Loss for Face Verification

Semi-Supervised Adversarial Monocular Depth Estimation