Abstract:As a flexible passive 3D sensing means, unsupervised learning of depth from monocular videos is becoming an important research topic. It utilizes the photometric errors between the target view and the synthesized views from its adjacent source views as the loss instead of the difference from the ground truth. Occlusion and scene dynamics in real-world scenes still adversely affect the learning, despite significant progress made recently. In this paper, we show that deliberately manipulating photometric errors can efficiently deal with these difficulties better. We first propose an outlier masking technique that considers the occluded or dynamic pixels as statistical outliers in the photometric error map. With the outlier masking, the network learns the depth of objects that move in the opposite direction to the camera more accurately. To the best of our knowledge, such cases have not been seriously considered in the previous works, even though they pose a high risk in applications like autonomous driving. We also propose an efficient weighted multi-scale scheme to reduce the artifacts in the predicted depth maps. Extensive experiments on the KITTI dataset and additional experiments on the Cityscapes dataset have verified the proposed approach's effectiveness on depth or ego-motion estimation. Furthermore, for the first time, we evaluate the predicted depth on the regions of dynamic objects and static background separately for both supervised and unsupervised methods. The evaluation further verifies the effectiveness of our proposed technical approach and provides some interesting observations that might inspire future research in this direction.

Revisiting Self-Supervised Monocular Depth Estimation

Monocular Depth Estimation Based on Unsupervised Learning

Digging Into Self-Supervised Monocular Depth Estimation

Self-Supervised Monocular Depth Estimation With Self-Perceptual Anomaly Handling

Self-Supervised Learning based Depth Estimation from Monocular Images

Hierarchical Multi-scale Architecture Search for Self-supervised Monocular Depth Estimation

Self-supervised monocular depth estimation via joint attention and intelligent mask loss

Unsupervised Simultaneous Learning for Camera Re-Localization and Depth Estimation from Video

RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes

Embodiment: Self-Supervised Depth Estimation Based on Camera Models

Cycle-SfM: Joint Self-Supervised Learning of Depth and Camera Motion from Monocular Image Sequences.

MDSNet: self-supervised monocular depth estimation for video sequences using self-attention and threshold mask

SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

Monocular Depth Estimation Using Self-Supervised Learning with More Effective Geometric Constraints

Unsupervised Monocular Depth Perception: Focusing on Moving Objects

Deeper into Self-Supervised Monocular Indoor Depth Estimation

Unsupervised Monocular Estimation of Depth and Visual Odometry uUsing Attention and Depth-Pose Consistency Loss

Self-supervised 3D Object Detection from Monocular Pseudo-LiDAR

Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative Convolution Network

Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module