Abstract:In recent years, with the vigorous development of artificial intelligence and autonomous driving technology, the importance of scene perception technology is increasing. Unsupervised deep learning based methods have demonstrated a certain level of robustness and accuracy in some challenging scenes. By inferring depth from a single input image without any ground truth label, a lot of time and resources can be saved. However, unsupervised depth estimation has defects in robustness and accuracy under complex environment which could be improved by modifying network structure and incorporating other modal information. In this paper, we propose an unsupervised, monocular depth estimation network achieving high speed and accuracy, and a learning framework with our depth estimation network to improve depth performance by incorporating transformed images across different modalities. The depth estimator is an encoder-decoder network to generate the multi-scale dense depth map. The sub-pixel convolutional layer is adopted to obtain depth super-resolution by replacing the up-sample branches. The cross-modal depth estimation using near-infrared image and RGB image enhances the performance of depth estimation than pure RGB image. The training mode is to transfer both images to the same modality and then carry out super-resolved depth estimation for each stereo camera pair. Compared with the initial results of depth estimation using only RGB images, the experiment verifies that our depth estimation network with the cross-modal fusion system designed in this paper achieves better performance on public datasets and a multi-modal dataset collected by our stereo vision sensor.

Depth Prediction from Monocular Images with CGAN.

Depth Estimation from Monocular Image and Coarse Depth Points Based on Conditional GAN

Conditional Generative Adversarial Network for Monocular Image Depth Map Prediction

Depth Generation Network: Estimating Real World Depth From Stereo And Depth Images

Monocular Depth Estimation Based on Multi-Scale Graph Convolution Networks

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Depth Estimation from Monocular Images Using Dilated Convolution and Uncertainty Learning.

Depth-Conditioned GAN for Underwater Image Enhancement

Generative Adversarial Networks for Unsupervised Monocular Depth Prediction

Monocular Depth Estimation with Guidance of Surface Normal Map

Unpaired Single-Image Depth Synthesis with cycle-consistent Wasserstein GANs

Unsupervised Learning of Depth Estimation and Camera Pose With Multi-Scale GANs

Promising Depth Map Prediction Method from a Single Image Based on Conditional Generative Adversarial Network

Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion

Depth Images Could Tell Us More: Enhancing Depth Discriminability for RGB-D Scene Recognition

Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation

Enhanced Monocular Depth Estimation: A CNN Integrating Semantic Segmentation Embedding And Vanishing Point Detection

Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild.

AGG-Net: Attention Guided Gated-convolutional Network for Depth Image Completion

Deep Monocular Depth Estimation via Integration of Global and Local Predictions

Monocular Depth Estimation using Diffusion Models