Abstract:In recent years, with the vigorous development of artificial intelligence and autonomous driving technology, the importance of scene perception technology is increasing. Unsupervised deep learning based methods have demonstrated a certain level of robustness and accuracy in some challenging scenes. By inferring depth from a single input image without any ground truth label, a lot of time and resources can be saved. However, unsupervised depth estimation has defects in robustness and accuracy under complex environment which could be improved by modifying network structure and incorporating other modal information. In this paper, we propose an unsupervised, monocular depth estimation network achieving high speed and accuracy, and a learning framework with our depth estimation network to improve depth performance by incorporating transformed images across different modalities. The depth estimator is an encoder-decoder network to generate the multi-scale dense depth map. The sub-pixel convolutional layer is adopted to obtain depth super-resolution by replacing the up-sample branches. The cross-modal depth estimation using near-infrared image and RGB image enhances the performance of depth estimation than pure RGB image. The training mode is to transfer both images to the same modality and then carry out super-resolved depth estimation for each stereo camera pair. Compared with the initial results of depth estimation using only RGB images, the experiment verifies that our depth estimation network with the cross-modal fusion system designed in this paper achieves better performance on public datasets and a multi-modal dataset collected by our stereo vision sensor.

Spatiotemporally Enhanced Photometric Loss for Self-Supervised Monocular Depth Estimation.

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Spatiotemporal Guided Self-Supervised Depth Completion from LiDAR and Monocular Camera

Enhancing Self-supervised Monocular Depth Estimation Via Incorporating Robust Constraints.

Self-Supervised Monocular Depth Estimation with Multi-constraints

Self-supervised Monocular Depth Estimation with Multi-Scale Structure Similarity Loss

Towards Loss Balance and Consistent Model in Self-supervised Monocular Depth Estimation

Unsupervised Monocular Depth and Pose Estimation Using Multiple Masks Based on Photometric and Geometric Consistency

Unsupervised Monocular Depth Estimation Based on Dual Attention Mechanism and Depth-Aware Loss

Self-Supervised Monocular Depth Estimation Via Effective Local Consistency and Multi-scale Attention

Self-supervised Monocular Depth Estimation with Self-Distillation and Dense Skip Connection

SPDepth: Enhancing Self-Supervised Indoor Monocular Depth Estimation Via Self-Propagation

Triaxial Squeeze Attention Module and Mutual-Exclusion Loss Based Unsupervised Monocular Depth Estimation

Monocular Depth Estimation Using Self-Supervised Learning with More Effective Geometric Constraints

Toward Better SSIM Loss for Unsupervised Monocular Depth Estimation

Unsupervised Monocular Estimation of Depth and Visual Odometry uUsing Attention and Depth-Pose Consistency Loss

Spatial-Aware Dynamic Lightweight Self-Supervised Monocular Depth Estimation

TSUDepth: Exploring Temporal Symmetry-Based Uncertainty for Unsupervised Monocular Depth Estimation

Self-Supervised Monocular Depth Learning in Low-Texture Areas

Self-supervised monocular depth estimation via joint attention and intelligent mask loss

Self-Supervised Monocular Depth Estimation With Self-Perceptual Anomaly Handling