Abstract:As a flexible passive 3D sensing means, unsupervised learning of depth from monocular videos is becoming an important research topic. It utilizes the photometric errors between the target view and the synthesized views from its adjacent source views as the loss instead of the difference from the ground truth. Occlusion and scene dynamics in real-world scenes still adversely affect the learning, despite significant progress made recently. In this paper, we show that deliberately manipulating photometric errors can efficiently deal with these difficulties better. We first propose an outlier masking technique that considers the occluded or dynamic pixels as statistical outliers in the photometric error map. With the outlier masking, the network learns the depth of objects that move in the opposite direction to the camera more accurately. To the best of our knowledge, such cases have not been seriously considered in the previous works, even though they pose a high risk in applications like autonomous driving. We also propose an efficient weighted multi-scale scheme to reduce the artifacts in the predicted depth maps. Extensive experiments on the KITTI dataset and additional experiments on the Cityscapes dataset have verified the proposed approach's effectiveness on depth or ego-motion estimation. Furthermore, for the first time, we evaluate the predicted depth on the regions of dynamic objects and static background separately for both supervised and unsupervised methods. The evaluation further verifies the effectiveness of our proposed technical approach and provides some interesting observations that might inspire future research in this direction.

UnLearnerMC: Unsupervised Learning of Dense Depth and Camera Pose Using Mask and Cooperative Loss

Monocular Depth Estimation Based on Unsupervised Learning

Unsupervised Learning of Monocular Depth and Large-Ego-Motion with Multiple loop consistency losses

Unsupervised Learning of Depth and Pose Estimation Based on Continuous Frame Window.

Unsupervised Learning of Monocular Depth and Ego-Motion Using Multiple Masks

Unsupervised Monocular Depth and Pose Estimation Using Multiple Masks Based on Photometric and Geometric Consistency

Unsupervised Monocular Depth Perception: Focusing on Moving Objects

MuDeepNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose Using Multi-view Consistency Loss

Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor.

Unsupervised Estimation of Monocular Depth and VO in Dynamic Environments Via Hybrid Masks

Unsupervised Learning of Depth, Optical Flow and Pose With Occlusion From 3D Geometry

Unsupervised Monocular Estimation of Depth and Visual Odometry uUsing Attention and Depth-Pose Consistency Loss

Unsupervised Learning of Monocular Depth and Ego-Motion with Space–temporal-Centroid Loss

Self-supervised monocular depth estimation via joint attention and intelligent mask loss

An Adaptive Unsupervised Learning Framework For Monocular Depth Estimation

Unsupervised Framework for Depth Estimation and Camera Motion Prediction from Video.

Unsupervised Monocular Depth Estimation for Monocular Visual SLAM Systems

Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video

Self-supervised Multi-frame Monocular Depth Estimation with Pseudo-LiDAR Pose Enhancement.

Temporal-Aware SfM-Learner: Unsupervised Learning Monocular Depth and Motion from Stereo Video Clips.

Masked GAN for Unsupervised Depth and Pose Prediction with Scale Consistency