Abstract:In recent years, with the vigorous development of artificial intelligence and autonomous driving technology, the importance of scene perception technology is increasing. Unsupervised deep learning based methods have demonstrated a certain level of robustness and accuracy in some challenging scenes. By inferring depth from a single input image without any ground truth label, a lot of time and resources can be saved. However, unsupervised depth estimation has defects in robustness and accuracy under complex environment which could be improved by modifying network structure and incorporating other modal information. In this paper, we propose an unsupervised, monocular depth estimation network achieving high speed and accuracy, and a learning framework with our depth estimation network to improve depth performance by incorporating transformed images across different modalities. The depth estimator is an encoder-decoder network to generate the multi-scale dense depth map. The sub-pixel convolutional layer is adopted to obtain depth super-resolution by replacing the up-sample branches. The cross-modal depth estimation using near-infrared image and RGB image enhances the performance of depth estimation than pure RGB image. The training mode is to transfer both images to the same modality and then carry out super-resolved depth estimation for each stereo camera pair. Compared with the initial results of depth estimation using only RGB images, the experiment verifies that our depth estimation network with the cross-modal fusion system designed in this paper achieves better performance on public datasets and a multi-modal dataset collected by our stereo vision sensor.

To Complete or to Estimate, That is the Question: A Multi-Task Approach to Depth Completion and Monocular Depth Estimation

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Least Square Estimation Network for Depth Completion

MFF-Net: Towards Efficient Monocular Depth Completion With Multi-Modal Feature Fusion

Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion

Semantic-guided Depth Completion from Monocular Images and 4D Radar Data

Depth Estimation from Monocular Images Using Dilated Convolution and Uncertainty Learning.

Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera

Depth Is All You Need for Monocular 3D Detection

Learning an Efficient Multimodal Depth Completion Model

Depth Completion Towards Different Sensor Configurations Via Relative Depth Map Estimation and Scale Recovery

Object Semantics Give Us the Depth We Need: Multi-task Approach to Aerial Depth Completion

SemSegDepth: A Combined Model for Semantic Segmentation and Depth Completion

A Concise but High-performing Network for Image Guided Depth Completion in Autonomous Driving

DEUX: Active Exploration for Learning Unsupervised Depth Perception

Depth Map Completion by Jointly Exploiting Blurry Color Images and Sparse Depth Maps

PanDepth: Joint Panoptic Segmentation and Depth Completion

Deep Depth Completion from Extremely Sparse Data: A Survey

UAMD-Net: A Unified Adaptive Multimodal Neural Network for Dense Depth Completion

G2-MonoDepth: A General Framework of Generalized Depth Inference from Monocular RGB+X Data