Abstract:In recent years, with the vigorous development of artificial intelligence and autonomous driving technology, the importance of scene perception technology is increasing. Unsupervised deep learning based methods have demonstrated a certain level of robustness and accuracy in some challenging scenes. By inferring depth from a single input image without any ground truth label, a lot of time and resources can be saved. However, unsupervised depth estimation has defects in robustness and accuracy under complex environment which could be improved by modifying network structure and incorporating other modal information. In this paper, we propose an unsupervised, monocular depth estimation network achieving high speed and accuracy, and a learning framework with our depth estimation network to improve depth performance by incorporating transformed images across different modalities. The depth estimator is an encoder-decoder network to generate the multi-scale dense depth map. The sub-pixel convolutional layer is adopted to obtain depth super-resolution by replacing the up-sample branches. The cross-modal depth estimation using near-infrared image and RGB image enhances the performance of depth estimation than pure RGB image. The training mode is to transfer both images to the same modality and then carry out super-resolved depth estimation for each stereo camera pair. Compared with the initial results of depth estimation using only RGB images, the experiment verifies that our depth estimation network with the cross-modal fusion system designed in this paper achieves better performance on public datasets and a multi-modal dataset collected by our stereo vision sensor.

Depth Estimation Using an Improved Stereo Network

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Monocular Depth Estimation Based on Unsupervised Learning

Depth Generation Network: Estimating Real World Depth From Stereo And Depth Images

A Robust Monocular Depth Estimation Framework Based on Light-Weight ERF-Pspnet for Day-Night Driving Scenes

Depth Estimation of Traffic Scenes from Image Sequence Using Deep Learning.

On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach

Faster Self-adaptive Deep Stereo.

High-Quality Depth Recovery Via Interactive Multi-view Stereo

Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions

Depth incorporating with color improves salient object detection

Improving Unsupervised Learning of Monocular Depth and Ego-Motion Via Stereo Network

Efficient Edge-Preserving Multi-View Stereo Network for Depth Estimation

Self‐supervised binocular depth estimation algorithm with self‐rectification for autonomous driving

Brain Cholesterol XVIII: Effect of Methylphenidate (Ritalin) on [U-14C] Glucose and [2-3H] Acetate Incorporation

Unsupervised Monocular Depth Estimation with Encoder-decoder Network

Learning Monocular Depth by Distilling Cross-domain Stereo Networks

Real-Time Stereo Image Depth Estimation Network with Group-Wise L1 Distance for Edge Devices Towards Autonomous Driving

CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability

Depth Refinement for Improved Stereo Reconstruction

A Unified Framework for Depth Prediction from a Single Image and Binocular Stereo Matching