Abstract:In recent years, with the vigorous development of artificial intelligence and autonomous driving technology, the importance of scene perception technology is increasing. Unsupervised deep learning based methods have demonstrated a certain level of robustness and accuracy in some challenging scenes. By inferring depth from a single input image without any ground truth label, a lot of time and resources can be saved. However, unsupervised depth estimation has defects in robustness and accuracy under complex environment which could be improved by modifying network structure and incorporating other modal information. In this paper, we propose an unsupervised, monocular depth estimation network achieving high speed and accuracy, and a learning framework with our depth estimation network to improve depth performance by incorporating transformed images across different modalities. The depth estimator is an encoder-decoder network to generate the multi-scale dense depth map. The sub-pixel convolutional layer is adopted to obtain depth super-resolution by replacing the up-sample branches. The cross-modal depth estimation using near-infrared image and RGB image enhances the performance of depth estimation than pure RGB image. The training mode is to transfer both images to the same modality and then carry out super-resolved depth estimation for each stereo camera pair. Compared with the initial results of depth estimation using only RGB images, the experiment verifies that our depth estimation network with the cross-modal fusion system designed in this paper achieves better performance on public datasets and a multi-modal dataset collected by our stereo vision sensor.

Depth Privileged Object Detection in Indoor Scenes Via Deformation Hallucination.

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Depth Cue Enhancement and Guidance Network for RGB-D Salient Object Detection

Depth Awakens: A Depth-perceptual Attention Fusion Network for RGB-D Camouflaged Object Detection

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

Depth Estimation from Monocular Images Using Dilated Convolution and Uncertainty Learning.

Depth incorporating with color improves salient object detection

Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images with Virtual Depth

Boosting Weakly Supervised Object Detection using Fusion and Priors from Hallucinated Depth

Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB

Ranking-Based Salient Object Detection and Depth Prediction for Shallow Depth-of-Field

Indoor Scene Classification by Incorporating Predicted Depth Descriptor.

Depth Images Could Tell Us More: Enhancing Depth Discriminability for RGB-D Scene Recognition

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

NDDepth: Normal-Distance Assisted Monocular Depth Estimation

BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection

Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection.

Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation

Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection.