Abstract:Estimating depth from single RGB images and videos is of widespread interest due to its applications in many areas, including autonomous driving, 3D reconstruction, digital entertainment, and robotics. More than 500 deep learning-based papers have been published in the past 10 years, which indicates the growing interest in the task. This paper presents a comprehensive survey of the existing deep learning-based methods, the challenges they address, and how they have evolved in their architecture and supervision methods. It provides a taxonomy for classifying the current work based on their input and output modalities, network architectures, and learning methods. It also discusses the major milestones in the history of monocular depth estimation, and different pipelines, datasets, and evaluation metrics used in existing methods.

What problem does this paper attempt to address?

The paper aims to address the problem of depth estimation in monocular images and videos, providing a comprehensive review based on deep learning methods. Specifically: 1. **Task Description**: Estimating the 3D geometric structure of a scene from a single RGB image or video. This task has a wide range of applications, including autonomous driving, 3D reconstruction, digital entertainment, and robotics. 2. **Challenges**: - **Information Loss**: The loss of information during the 3D to 2D projection process makes it difficult for computers to accurately estimate depth. - **Camera Parameter Estimation**: Estimating the intrinsic and extrinsic parameters of the camera is required, but these parameters are often unavailable in practical applications. - **Lighting Inconsistency**: Variations in color and lighting between different input images can lead to different color distributions for the same depth map. - **Temporal Consistency in Videos**: Maintaining consistency and smoothness of depth between video frames is a challenge. - **Moving Objects and Camera Motion**: Videos contain both rigid and non-rigid moving objects, requiring the separation of camera motion from object motion. - **Training-Related Issues**: Obtaining accurate 3D ground truth data is difficult and time-consuming; models often lack generalization ability in new domains. - **Computational Resource Requirements**: Deep learning methods demand high memory and computation time, making implementation on resource-constrained devices challenging. 3. **Contributions**: - Provides an extensive classification system covering over 160 key papers, summarizing the input-output modalities, network architectures, supervision levels, datasets, and domain adaptation methods of existing approaches. - Discusses typical challenges in monocular depth estimation, particularly those specific to deep learning problems. - Proposes a general processing flow and baseline architecture for comparing network architectures in the literature, and contrasts existing methods based on supervision levels and loss functions. - Reviews commonly used datasets and summarizes over 20 datasets. - Discusses future research directions based on gaps identified in the survey. In summary, this paper provides a systematic analysis and guidance for future developments in the field of monocular depth estimation through a comprehensive review of existing research.

Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

Monocular Depth Estimation Based on Unsupervised Learning

Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review

Deep Learning based Monocular Depth Prediction: Datasets, Methods and Applications

A Survey on Deep Learning Techniques for Stereo-based Depth Estimation

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Deep learning for monocular depth estimation: A review

A Robust Monocular Depth Estimation Framework Based on Light-Weight ERF-Pspnet for Day-Night Driving Scenes

Monocular Depth Estimation Using Deep Learning: A Review

Monocular depth estimation based on deep learning: An overview

Monocular Depth Estimation: A Survey

Towards Real-Time Monocular Depth Estimation for Robotics: A Survey

Depth Estimation from Monocular Images Using Dilated Convolution and Uncertainty Learning.

Deep Learning-Based Stereopsis and Monocular Depth Estimation Techniques: A Review

UDepth: Fast Monocular Depth Estimation for Visually-guided Underwater Robots

Depth Estimation of Traffic Scenes from Image Sequence Using Deep Learning.

Depth Is All You Need for Monocular 3D Detection

Self-Supervised Learning based Depth Estimation from Monocular Images

Outdoor Monocular Depth Estimation: A Research Review

Deep Monocular Depth Estimation Based on Content and Contextual Features