Visual Odometry Based on Convolutional Neural Networks for Large-Scale Scenes

Xuyang Meng,Chunxiao Fan,Yue Ming
DOI: https://doi.org/10.1117/12.2524278
2019-01-01
Abstract:The task of visual odometry (VO) is to estimation camera motion and image depth, which is the main part of 3D reconstruction and the front-end of simultaneous localization and mapping (SLAM). However, the accuracy of most of the existing methods is low or some advanced sensors are required. In order to predict camera pose and image depth at the same time with high accuracy from image sequences captured by monocular camera, we train a novel framework, named MD-Net, and it is based on convolutional neural networks (CNNs). There are two main modules: one is camera motion estimator which is able to estimate the 6-DoF camera pose, the other is depth estimator computing the depth of its view. The keys of our proposed framework are that we can not only train our two independent estimators, but also predict depth and camera motion simultaneously. What's more, our motion estimator includes some shared convolutional layers and is divided into two branches to estimate camera orientation and translation, respectively. Experiments on KITTI dataset and TUM dataset show that our proposed method can extract meaningful depth estimation and successfully estimate frame-to-frame camera rotations and translations in large scenes even texture-less. It outperforms previous methods in terms of accuracy and robustness.
What problem does this paper attempt to address?