PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation

Zhenyu Li,Shariq Farooq Bhat,Peter Wonka

2024-06-11

Abstract:This paper introduces PatchRefiner, an advanced framework for metric single image depth estimation aimed at high-resolution real-domain inputs. While depth estimation is crucial for applications such as autonomous driving, 3D generative modeling, and 3D reconstruction, achieving accurate high-resolution depth in real-world scenarios is challenging due to the constraints of existing architectures and the scarcity of detailed real-world depth data. PatchRefiner adopts a tile-based methodology, reconceptualizing high-resolution depth estimation as a refinement process, which results in notable performance enhancements. Utilizing a pseudo-labeling strategy that leverages synthetic data, PatchRefiner incorporates a Detail and Scale Disentangling (DSD) loss to enhance detail capture while maintaining scale accuracy, thus facilitating the effective transfer of knowledge from synthetic to real-world data. Our extensive evaluations demonstrate PatchRefiner's superior performance, significantly outperforming existing benchmarks on the Unreal4KStereo dataset by 18.1% in terms of the root mean squared error (RMSE) and showing marked improvements in detail accuracy and consistent scale estimation on diverse real-world datasets like CityScape, ScanNet++, and ETH3D.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the metric depth estimation problem of monocular images in high - resolution real - world scenes. Specifically, the author points out the challenges faced by existing methods when dealing with high - resolution real - domain inputs, including: 1. **Resolution limitations of existing architectures**: Most state - of - the - art depth - estimation architectures have limitations in memory and computational resources when processing high - resolution images. 2. **Scarcity of high - quality real - world depth data**: High - resolution real - world depth datasets are very scarce. Existing datasets are usually of low resolution and often lack ground - truth data, especially near object boundaries. To solve these problems, the author proposes a new framework named PatchRefiner. PatchRefiner improves high - resolution depth estimation in the following ways: - **Tile - based method**: Reconceptualizes the high - resolution depth - estimation task as a refinement process and adopts a tile - based method to handle high - resolution inputs. - **Pseudo - label strategy**: Utilizes synthetic data to generate pseudo - labels to overcome the problem of scarce real - world data. - **Detail - and - Scale - Decoupled Loss (DSD Loss)**: Introduces a new loss function that combines rank supervision and scale invariance, thereby effectively transferring knowledge from synthetic data to real - world data and enhancing the ability to capture details while maintaining scale accuracy. These improvements make PatchRefiner significantly outperform existing methods on multiple benchmark datasets. In particular, on the Unreal4KStereo synthetic dataset, its RMSE is reduced by 18.1% and its REL is reduced by 15.7%. Moreover, it also performs well on real - world datasets such as CityScape, ScanNet++ and ETH3D, significantly improving the accuracy of boundary details and the consistency of scale estimation. In summary, this paper aims to solve the challenges in high - resolution real - domain monocular depth estimation through innovative framework design and loss functions and improve the performance of the model in real - world scenarios.

PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation

PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering

Double Refinement Network for Efficient Indoor Monocular Depth Estimation

Depth Refinement for Improved Stereo Reconstruction

High-Resolution Synthetic RGB-D Datasets for Monocular Depth Estimation

Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation

Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images with Virtual Depth

Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild.

Self-Distilled Depth Refinement with Noisy Poisson Fusion

MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation

PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Boosting Monocular Depth Estimation with Sparse Guided Points

Synthetic Data Enhancement and Network Compression Technology of Monocular Depth Estimation for Real-Time Autonomous Driving System

DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

Global Depth Refinement Based on Patches

SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation