Abstract:In recent years, with the vigorous development of artificial intelligence and autonomous driving technology, the importance of scene perception technology is increasing. Unsupervised deep learning based methods have demonstrated a certain level of robustness and accuracy in some challenging scenes. By inferring depth from a single input image without any ground truth label, a lot of time and resources can be saved. However, unsupervised depth estimation has defects in robustness and accuracy under complex environment which could be improved by modifying network structure and incorporating other modal information. In this paper, we propose an unsupervised, monocular depth estimation network achieving high speed and accuracy, and a learning framework with our depth estimation network to improve depth performance by incorporating transformed images across different modalities. The depth estimator is an encoder-decoder network to generate the multi-scale dense depth map. The sub-pixel convolutional layer is adopted to obtain depth super-resolution by replacing the up-sample branches. The cross-modal depth estimation using near-infrared image and RGB image enhances the performance of depth estimation than pure RGB image. The training mode is to transfer both images to the same modality and then carry out super-resolved depth estimation for each stereo camera pair. Compared with the initial results of depth estimation using only RGB images, the experiment verifies that our depth estimation network with the cross-modal fusion system designed in this paper achieves better performance on public datasets and a multi-modal dataset collected by our stereo vision sensor.

Search-Based Depth Estimation Via Coupled Dictionary Learning With Large-Margin Structure Inference

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Binocular Depth Estimation Using Convolutional Neural Network With Siamese Branches.

Monocular Depth Estimation Based on Unsupervised Learning

Depth estimation for outdoor image using couple dictionary learning and region detection

Depth Estimation from Monocular Images Using Dilated Convolution and Uncertainty Learning.

Coupled Depth Learning

Size-to-depth: A New Perspective for Single Image Depth Estimation

DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation

Deep eyes: Joint depth inference using monocular and binocular cues

Depth Estimation from Multi-Scale SLIC Superpixels Using Non-Parametric Learning

DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

Monocular Depth Estimation Using Cues Inspired by Biological Vision Systems

Depth Insight -- Contribution of Different Features to Indoor Single-image Depth Estimation

WorDepth: Variational Language Prior for Monocular Depth Estimation

V2Depth: Monocular Depth Estimation via Feature-Level Virtual-View Simulation and Refinement

FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen Indoor Scene

Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning

> ? ∗ > 0 B ? ∗ > 0 C ? ∗ > 0 DEC Conv = Full-image Encoder Conv Conv Conv Conv Conv Conv Convs ASPP # Dense Feature Extractor Scene Understanding Modular Ordinal Regression Input Output

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning