OnionNet: Single-View Depth Prediction and Camera Pose Estimation for Unlabeled Video
Tianhao Gu,Zhe Wang,Dongdong Li,Hai Yang,Wenli Du,Yangming Zhou
DOI: https://doi.org/10.1109/tcds.2020.3042521
IF: 4.546
2021-12-01
IEEE Transactions on Cognitive and Developmental Systems
Abstract:In real scenes, humans can easily infer their positions and distances from other objects with their own eyes. To make the robots have the same visual ability, this article presents an unsupervised OnionNet framework, including LeafNet and ParachuteNet, for single-view depth prediction and camera pose estimation. In OnionNet, for speeding up OnionNet's convergence and concretizing objects against the gradient locality and moving objects in videos, LeafNet adopts two decoders and enhanced upconvolution modules. Meanwhile, for improving the robustness of fast camera movement and rotation, ParachuteNet uses and integrates three pose networks to estimate multiview camera pose parameters by combining with the modified image preprocess. Different from existing methods, single-view depth prediction and camera pose estimation are trained view by view, where the variations between views is gradual reduction of view range and outer pixels disappear in next view, similar to onion peeling. Moreover, the LeafNet is optimized with pose parameter from each pose network in turn. Experimental results on the KITTI data set show the outstanding effectiveness of our method: single-view depth performs better than most supervised and unsupervised methods which contain two same subtasks, and pose estimation gets the state-of-the-art performance compared with existing methods under the comparable input settings.
robotics,computer science, artificial intelligence,neurosciences