Towards Practical Consistent Video Depth Estimation.

Pengzhi Li,Yikang Ding,Linge Li,Jingwei Guan,Zhiheng Li
DOI: https://doi.org/10.1145/3591106.3592264
2023-01-01
Abstract:Monocular depth estimation algorithms aim to explore the possible links between 2D and 3D data, but challenges remain for existing methods to predict consistent depth from a casual video. Relying on camera poses and the optical flow in the time-consuming test-time training phases makes these methods fail in many scenarios and cannot be used for practical applications. In this work, we present a data-driven post-processing method to overcome these challenges and achieve online processing. Based on a deep recurrent network, our method takes the adjacent original and optimized depth map as inputs to learn temporal consistency from the dataset and achieves higher depth accuracy. Our approach can be applied to multiple single-frame depth estimation models and used for various real-world scenes in real-time. In addition, to tackle the lack of a temporally consistent video depth training dataset of dynamic scenes, we propose an approach to generate the training video sequences dataset from a single image based on inferring motion field. To the best of our knowledge, this is the first data-driven plug-and-play method to improve the temporal consistency of depth estimation for casual videos. Extensive experiments on three datasets and three depth estimation models show that our method outperforms the state-of-the-art methods.
What problem does this paper attempt to address?