Color and Geometric Contrastive Learning Based Intra-Frame Supervision for Self-Supervised Monocular Depth Estimation

Yanbo Gao,Xianye Wu,Shuai Li,Xun Cai,Chuankun Li
DOI: https://doi.org/10.1109/lsp.2024.3480032
2024-10-29
IEEE Signal Processing Letters
Abstract:In recent years, self-supervised monocular depth estimation has become popular due to its advantage in estimating the depth without the need of groundtruth depth labels. Instead, it takes an inter-frame supervision using depth based view synthesis to reconstruct temporal adjacent frames to indirectly supervise the generated depth. However, such supervision weakens the depth estimation at temporal incoherent regions containing small changes among consecutive frames. To overcome the above problem, we propose a color and geometric contrastive learning based intra-frame supervision framework to enhance self-supervised monocular depth estimation. Color-contrastive learning is proposed to guide the network to learn color invariant features considering color information is irrelevant to depth data. To improve the local details of the learned feature, a pixel-level contrastive learning is further used to optimize the learning. In view that the depth estimation, as a pixel-level task, is sensitive to the geometric transformation, geometric-contrastive learning is developed using an inverse geometric transformation to learn features that are equivariant to the geometric data augmentation. A local plane guidance layer (LPG) with contrastive learning is further used to decompose the geometric information and enhance the geometric contrastive learning. Experiments demonstrate that the proposed method achieves the best result compared to the state-of-the-art methods in all tested quality metrics, with the largest improvement of 22.8% over baseline Monodepth2 and 3.2% over Monovit, in terms of SqRel reduction.
engineering, electrical & electronic
What problem does this paper attempt to address?