Learning Structure Affinity for Video Depth Estimation.

Yuanzhouhan Cao,Yidong Li,Haokui Zhang,Chao Ren,Yifan Liu
DOI: https://doi.org/10.1145/3474085.3475564
2021-01-01
Abstract:Depth estimation is a structure learning problem. The affinity among neighbouring pixels plays an important role in inferring depth values. In this paper, we propose to learn structure affinity in both spatial and temporal domain for accurate depth estimation from monocular videos. Specifically, we first propose a convolutional spatial temporal propagation network (CSTPN) that learns affinity among neighbouring video frames. Secondly, we employ a structure knowledge distillation scheme that transfers the spatial temporal affinity learned by cumbersome network to compact network. By calculating pixel-wise similarities between neighboring frames and neighbouring sequences, our knowledge distillation scheme efficiently captures both short-term and long-term spatial temporal affinity. Finally, we apply a warping loss based on optical flow between video frames to further enforce the temporal affinity. Experiment results show that our proposed depth estimation approach outperform the state-of-the-art methods on both indoor and outdoor benchmark datasets.
What problem does this paper attempt to address?