Abstract:In recent years, with the vigorous development of artificial intelligence and autonomous driving technology, the importance of scene perception technology is increasing. Unsupervised deep learning based methods have demonstrated a certain level of robustness and accuracy in some challenging scenes. By inferring depth from a single input image without any ground truth label, a lot of time and resources can be saved. However, unsupervised depth estimation has defects in robustness and accuracy under complex environment which could be improved by modifying network structure and incorporating other modal information. In this paper, we propose an unsupervised, monocular depth estimation network achieving high speed and accuracy, and a learning framework with our depth estimation network to improve depth performance by incorporating transformed images across different modalities. The depth estimator is an encoder-decoder network to generate the multi-scale dense depth map. The sub-pixel convolutional layer is adopted to obtain depth super-resolution by replacing the up-sample branches. The cross-modal depth estimation using near-infrared image and RGB image enhances the performance of depth estimation than pure RGB image. The training mode is to transfer both images to the same modality and then carry out super-resolved depth estimation for each stereo camera pair. Compared with the initial results of depth estimation using only RGB images, the experiment verifies that our depth estimation network with the cross-modal fusion system designed in this paper achieves better performance on public datasets and a multi-modal dataset collected by our stereo vision sensor.

Depth-Guided Aggregation for Real-Time Binocular Depth Estimation Network

Depth Generation Network: Estimating Real World Depth From Stereo And Depth Images

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Monocular Depth Estimation Based on Multi-Scale Graph Convolution Networks

Depth Cue Enhancement and Guidance Network for RGB-D Salient Object Detection

PADENet: an Efficient and Robust Panoramic Monocular Depth Estimation Network for Outdoor Scenes.

Real-Time Stereo Image Depth Estimation Network with Group-Wise L1 Distance for Edge Devices Towards Autonomous Driving

Monocular Depth Estimation With Affinity, Vertical Pooling, And Label Enhancement

HA-Bins: Hierarchical Adaptive Bins for Robust Monocular Depth Estimation across Multiple Datasets

Monocular Depth Estimation Based on Dilated Convolutions and Feature Fusion

Lightweight Monocular Depth Estimation with an Edge Guided Network

Fast Monocular Depth Estimation via Side Prediction Aggregation with Continuous Spatial Refinement

DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation

MBUDepthNet: Real-Time Unsupervised Monocular Depth Estimation Method for Outdoor Scenes

CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability

Deep Neighbor Layer Aggregation for Lightweight Self-Supervised Monocular Depth Estimation

Re-Parameterized Real-Time Stereo Matching Network Based on Mixed Cost Volumes Toward Autonomous Driving

A Concise but High-performing Network for Image Guided Depth Completion in Autonomous Driving

BRNet: Exploring Comprehensive Features for Monocular Depth Estimation.

LW-Net: A Lightweight Network for Monocular Depth Estimation

Boundary-induced and scene-aggregated network for monocular depth prediction