Unsupervised Learning of Monocular Depth and Ego-motion in Outdoor/Indoor Environments
Ruipeng Gao,Xuan Xiao,Weiwei Xing,Chi Li,Lei Liu
DOI: https://doi.org/10.1109/jiot.2022.3151629
IF: 10.6
2022-01-01
IEEE Internet of Things Journal
Abstract:Visual-based unsupervised learning [1]–[3] has emerged as a promising approach in estimating monocular depth and ego-motion, avoiding intensive efforts on collecting and labeling the ground truth. However, they are still restrained by the brightness constancy assumption among video sequences, especially susceptible with frequent illumination variations or nearby textureless surroundings in indoor environments. In this article, we selectively combine the complementary strength of visual and inertial measurements, i.e., videos extract static and distinct features while inertial readings depict scale-consistent and environment-agnostic movements, and propose a novel unsupervised learning framework to predict both monocular depth and ego-motion trajectory simultaneously. This challenging task is solved by learning both forward and backward inertial sequences to eliminate inevitable noises, and reweighting visual and inertial features via gated neural networks in various environments or with user-specific moving dynamics. In addition, we also employ structure cues to produce scene depths from a single image and explore structure consistency constraints to calibrate the depth estimates in indoor buildings. Experiments on the outdoor KITTI data set and our dedicated indoor prototype reveal that our approach consistently outperforms the state of the art on both depth and ego-motion estimates. To the best of our knowledge, this is the first work to fuse visual and inertial data without any supervision signals for monocular depth and ego-motion estimation, and our solution remain effective and robust even in textureless indoor scenarios.
computer science, information systems,telecommunications,engineering, electrical & electronic