Self-supervised Depth Estimation with High Resolution Features and Non-local Information.

Rongying Jing,Yang Liu
DOI: https://doi.org/10.1145/3529836.3529907
2022-01-01
Abstract:Depth estimation is one of the most challenging tasks in computer vision, especially in self-supervised learning ways without restrictions of high-cost labels. Self-supervised depth estimation aims to infer three-dimensional space structures from two-dimensional planar images, only taking image pairs or sequences as supervision. Most existing methods adopt the encoder-decoder framework with skip-connection and recover the high-resolution depth maps from high-resolution low-level and low-resolution high-level feature maps. However, it is proved that high-resolution high-level feature maps, which are sensitive to illumination, color, texture, etc., are necessary for depth estimation. In this paper, we present a novel approach to extract high-level feature maps at all scales and introduce a self-attention mechanism to consider non-local features. The main improvements of our proposed method are two-fold:1) we combined the high-resolution feature extraction sub-network and extract high-resolution high-level features by connecting the high-to-low resolution convolution streams in parallel; 2) we embed the self-attention module with the features pyramid module(FPA) to obtain general context at large-scale features. The experiments evaluated on the KITTI benchmark have demonstrated that our network outperforms most existing methods and produces more accurate depth maps.
What problem does this paper attempt to address?