Unsupervised Monocular Estimation of Depth and Visual Odometry uUsing Attention and Depth-Pose Consistency Loss

Xiaogang Song,Haoyue Hu,Li Liang,Weiwei Shi,Guo Xie,Xiaofeng Lu,Xinhong Hei
DOI: https://doi.org/10.1109/tmm.2023.3312950
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Recent studies have shown that joint depth and pose estimation using convolutional neural networks (CNNs) can learn unlabelled monocular frames. However, three problems remain: 1) CNNs can only extract local features due to the limited receptive field, 2) scale ambiguity is inherent in the monocular task, and 3) illness regions violate the photometric consistency assumption and produce large errors. We propose a novel framework, ADPDepth, with corresponding effective strategies to ameliorate the above problems. First, a PCAtt module is designed to capture the correlation between channels and efficiently extract multiscale spatial information using a multibranch parallel strategy. Second, depth-pose consistency loss is proposed based on the geometric consistency in depth and pose to constrain the scale between samples, eliminate scale ambiguity and obtain a globally consistent scale. To further improve performance, a cover mask is derived from depth-pose consistency for filtering dynamic objects and outliers to reduce the adverse effects of these illness regions. Extensive experiments are conducted on the KITTI, NYU-Depth and Make3D datasets. Based on public benchmarks, the experimental results confirm that the proposed ADPDepth framework achieves state-of-the-art performance. The effectiveness of each strategy is also verified in subsequent ablation experiments.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?