Unsupervised Learning of Depth and Ego-Motion with Absolutely Global Scale Recovery from Visual and Inertial Data Sequences

Yanwen Meng,Qiyu Sun,Chongzhen Zhang,Yang Tang
DOI: https://doi.org/10.1080/23335777.2020.1811386
2020-01-01
Cyber-Physical Systems
Abstract:In this paper, we propose an unsupervised learning method for jointly estimating monocular depth and ego-motion, which is capable to recover the absolute scale of global camera trajectory. In order to solve the general problems of scale drift and scale ambiguity of monocular camera, we fuse geometric movement data from inertial measurement unit (IMU), and use Bi-directional Long Short-Term Memory (BiLSTM) to extract temporal features. Besides, we add a lightweight and efficient attention mechanism, Convolutional Block Attention Module (CBAM), to Convolutional Neural Networks (CNNs) to complete the extraction of image features. Considering the scenes with severe illumination changes, ambiguous structures, moving objects and occlusions, especially scenes with progressively-variant textures, the geometric features can provide adaptive estimation results in the case of the degeneration of visual features. Experiments on the KITTI driving dataset reveal that our scheme achieves promising results in the estimation of camera pose and depth. Moreover, the absolute scale recovery for the global camera trajectory is effective.
What problem does this paper attempt to address?