AggNet for Self-supervised Monocular Depth Estimation: Go an Aggressive Step Furthe.

Zhi Chen,Xiaoqing Ye,Liang Du,Wei Yang,Liusheng Huang,Xiao Tan,Zhenbo Shi,Fumin Shen,Errui Ding
DOI: https://doi.org/10.1145/3474085.3475287
2021-01-01
Abstract:Without appealing to exhaustive labeled data, self-supervised monocular depth estimation (MDE) plays a fundamental role in computer vision. Previous methods usually adopt a one-stage MDE network, which is insufficient to achieve high performance. In this paper, we dig deep into this task to propose an aggressive framework termed AggNet. The framework is based on a training-only progressive two-stage module to perform pseudo counter-surveillance as well as a simple yet effective dual-warp loss function between image pairs. In particular, we first propose a residual module, which follows the MDE network to learn a refined depth. The residual module takes both the initial depth generated from MDE and the initial color image as input to generate refined depth with residual depth learning. Then, the refined depth is leveraged to supervise the initial depth simultaneously during the training period. For inference, only the MDE network is retained to regress depth from a single image, which gains better performance without introducing extra computation. In addition to self-distillation loss, a simple yet effective dual-warp consistency loss is introduced to encourage the MDE network to keep depth consistency between stereo image pairs. Extensive experiments show that our AggNet achieves state-of-the-art performance on the KITTI and Make3D datasets.
What problem does this paper attempt to address?