Self-Supervised Place Recognition by Refining Temporal and Featural Pseudo Labels from Panoramic Data

Chao Chen,Xinhao Liu,Xuchu Xu,Yiming Li,Li Ding,Ruoyu Wang,Chen Feng
DOI: https://doi.org/10.1109/LRA.2024.3495584
2024-11-19
Abstract:Visual place recognition (VPR) using deep networks has achieved state-of-the-art performance. However, most of them require a training set with ground truth sensor poses to obtain positive and negative samples of each observation's spatial neighborhood for supervised learning. When such information is unavailable, temporal neighborhoods from a sequentially collected data stream could be exploited for self-supervised training, although we find its performance suboptimal. Inspired by noisy label learning, we propose a novel self-supervised framework named TF-VPR that uses temporal neighborhoods and learnable feature neighborhoods to discover unknown spatial neighborhoods. Our method follows an iterative training paradigm which alternates between: (1) representation learning with data augmentation, (2) positive set expansion to include the current feature space neighbors, and (3) positive set contraction via geometric verification. We conduct auto-labeling and generalization tests on both simulated and real datasets, with either RGB images or point clouds as inputs. The results show that our method outperforms self-supervised baselines in recall rate, robustness, and heading diversity, a novel metric we propose for VPR. Our code and datasets can be found at <a class="link-external link-https" href="https://ai4ce.github.io/TF-VPR/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?