CAHIR: Co-Attentive Hierarchical Image Representations for Visual Place Recognition.

Guohao Peng,Heshan Li,Yifeng Huang,Jun Zhang,Mingxing Wen,Singh Rahul,Danwei Wang
DOI: https://doi.org/10.1109/icra48891.2023.10160512
2023-01-01
Abstract:Robust visual place recognition (VPR) against significant appearance changes is crucial for the life-long operation of mobile robots. Focusing on this task, we propose a Co-Attentive Hierarchical Image Representations (CAHIR) framework for VPR, which unifies attention-sharing global and local descriptor generation into one encoding pipeline. The hierarchical descriptors are applied to a coarse-to-fine VPR system with global retrieval and local geometric verification. To explore high-quality local matches between task-relevant visual elements, a cross-attention mutual enhancement layer is introduced to strengthen the information interaction between the local descriptors. Through the proposed selective matching distillation, the mutual enhancement layer can learn from state-of-the-art local matchers in a distillation manner. After weighted cross-matching of the enhanced local descriptors, geometric verification is applied to evaluate the spatial consistency of the compared image pair. Experiments show CAHIR outperforms the existing global and local representations for VPR in terms of performance and efficiency. Quantitatively, it achieves state-of-the-art results on three city-scale benchmark datasets. Qualitatively, CAHIR proves to attach great importance to task-relevant visual elements and excels at finding local correspondences that are discriminative to the VPR task.
What problem does this paper attempt to address?