Self-Supervised Human Depth Estimation from Monocular Videos

Feitong Tan,Hao Zhu,Zhaopeng Cui,Siyu Zhu,Marc Pollefeys,Ping Tan
DOI: https://doi.org/10.1109/cvpr42600.2020.00073
2020-01-01
Abstract:Previous methods on estimating detailed human depth often require supervised training with 'ground truth' depth data. This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network. The self-supervised learning is achieved by minimizing a photo-consistency loss, which is evaluated between a video frame and its neighboring frames warped according to the estimated depth and the 3D non-rigid motion of the human body. To solve this non-rigid motion, we first estimate a rough SMPL model at each video frame and compute the non-rigid body motion accordingly, which enables self-supervised learning on estimating the shape details. Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.
What problem does this paper attempt to address?