Enhancing Visual Place Recognition with Multi-modal Features and Time-constrained Graph Attention Aggregation

Wang Zhuo,Zhang Yunzhou,Zhao Xinge,Ning Jian,Zou Dehao,Pei Meiqi
DOI: https://doi.org/10.1109/icra57147.2024.10611102
2024-01-01
Abstract:Visual place recognition(VPR) is a crucial technology for autonomous driving and robotic navigation. However, severe appearance and perspective changes often lead to degradation of algorithm performance. Current methods mainly utilize single-modality RGB images, which are sensitive to environmental changes. To address this challenge, we propose a novel multi-modal visual place recognition method by incorporating depth information as auxiliary data to enhance the robustness of the VPR algorithm. The pipeline involves dual-branch feature extraction and shared multi-modal feature fusion based on transformer(SFFM) to enable full interaction between semantic and structural information. Furthermore, we introduce a time-constrained graph attention aggregation(TC-GAT) that propagates node information across time and space to deal with perceptual aliasing. Extensive experiments on the Oxford Robotcar and MSLS datasets demonstrate that the proposed algorithm is not only effective in appearance changes but also competitive in opposing viewpoints.
What problem does this paper attempt to address?