A Unified Unsupervised Learning Framework for Stereo Matching and Ego-Motion Estimation

Hengsong Li,Xuesong Zhang,Yuanqi Wang,Anlong Ming
DOI: https://doi.org/10.1109/icip.2019.8803550
2019-01-01
Abstract:Learning to estimate depth and ego-motion from video sequences via deep convolutional networks is attracting significant attention for potentially wide computer vision applications. Most prior work in unsupervised depth learning use monocular video sequences as the input of their networks. However, their results need a scale factor that is computed frame-to-frame to maintain a stable relative scale. In this paper, we propose an unsupervised learning framework for the task of joint depth and ego-motion estimation from stereo sequences. The usage of stereo sequences can provide both spatial (left to right) and temporal (forward to back-ward) photometric warping constrains for supervised learning and allow for an absolute scale factor for the scene depth and camera pose, which is of great significance for vision guidance. Experiments on the KITTI driving dataset reveal that our framework outperforms state-of-the-art results employing unsupervised neural networks.
What problem does this paper attempt to address?