Abstract:In recent years, with the vigorous development of artificial intelligence and autonomous driving technology, the importance of scene perception technology is increasing. Unsupervised deep learning based methods have demonstrated a certain level of robustness and accuracy in some challenging scenes. By inferring depth from a single input image without any ground truth label, a lot of time and resources can be saved. However, unsupervised depth estimation has defects in robustness and accuracy under complex environment which could be improved by modifying network structure and incorporating other modal information. In this paper, we propose an unsupervised, monocular depth estimation network achieving high speed and accuracy, and a learning framework with our depth estimation network to improve depth performance by incorporating transformed images across different modalities. The depth estimator is an encoder-decoder network to generate the multi-scale dense depth map. The sub-pixel convolutional layer is adopted to obtain depth super-resolution by replacing the up-sample branches. The cross-modal depth estimation using near-infrared image and RGB image enhances the performance of depth estimation than pure RGB image. The training mode is to transfer both images to the same modality and then carry out super-resolved depth estimation for each stereo camera pair. Compared with the initial results of depth estimation using only RGB images, the experiment verifies that our depth estimation network with the cross-modal fusion system designed in this paper achieves better performance on public datasets and a multi-modal dataset collected by our stereo vision sensor.

Towards Practical Consistent Video Depth Estimation.

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Monocular Depth Estimation Based on Unsupervised Learning

Exploiting Temporal Consistency for Real-Time Video Depth Estimation

Learning Temporally Consistent Video Depth from Video Diffusion Priors

Consistent video depth estimation

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

Temporally Consistent Depth Map Estimation Based On 3d-Mrf

Consistent Depth of Moving Objects in Video

Spatio-Temporal Segmentation with Depth-Inferred Videos of Static Scenes

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Consistent depth maps recovery from a video sequence.

Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular Video Depth

Consistent Depth Maps Recovery from a Trinocular Video Sequence.

Robust Consistent Video Depth Estimation

A Temporally Streamlined Optimization Method for Stereo Video Correspondence

Video Depth without Video Models

Boosting Monocular Depth Estimation with Sparse Guided Points

Spatio-Temporal Depth Recovery of Dynamic Scenes with Multiple Handheld Cameras

Multiview Video Depth Estimation with Spatial-Temporal Consistency

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data