Unsupervised Learning of Depth and Pose Estimation Based on Continuous Frame Window.

Suning Shang,Huaimin Wang,Pengfei Zhang,Bo Ding
DOI: https://doi.org/10.1109/ijcnn.2018.8489713
2018-01-01
Abstract:We present an unsupervised learning framework for the task of monocular depth and camera motion estimation from video sequences. In common with recent work, we use an unsupervised end-to-end learning method, requiring monocular video sequences for training. What makes the difference is, our approach not only uses image reconstruction as the supervisory signal but also exploits the pose estimation method which was used in traditional SLAM approach to enhance the supervisory signal and add training constraints. In pose estimation, a continuous frame window is set to construct the pose graph. Our method uses single-view depth and multi-view pose networks, with a loss based on reconstructing nearby images to the target using the predicted depth and pose. During training, the networks are thus coupled by the loss but can be applied independently at test time. Our evaluation of experiments on the KITTI dataset proves the effectiveness of our method: 1) monocular depth performs superior to the supervised methods that use ground-truth depth data for training and the existing unsupervised learning method. Our method performs comparably with the supervised methods that use ground-truth pose data for training. 2) pose estimation performs almost the same compared to established SLAM systems under comparable input settings.
What problem does this paper attempt to address?