Semantic-focused Patch Tokenizer with Multi-branch Mixer for Visual Place Recognition

Xu Zhenyu,Ziliang Ren,Zhang Qieshi,Jie Lou,Tao Dacheng,Cheng Jun
DOI: https://doi.org/10.1109/icra57147.2024.10610372
2024-01-01
Abstract:Visual Place Recognition (VPR) is critical for navigation and loop closure in autonomous driving tasks, mitigating the impact of shift errors caused by dynamic changes in the environment. Due to the limited ability of backbone networks and extreme environmental changes, current methods fail to capture foundational semantic details that include the distinctive attributes for unique place identification. To address this problem, we propose a new visual token-guided VPR framework that contains a semantic-focused patch tokenizer and a multi-branch Mixer. To mitigate the inference from place-unrelated objects, the semantic-focused patch tokenizer exploits attention-based channel selection and spatial partition, which efficiently captures important semantic information within the channels and preserve spatial relationships among the backbone features. To extract abstract features with spatial structure information, the multi-branch Mixer utilizes a multi-branch structure to aggregate local and global position information, improving the robustness of global representations to environmental changes. Experimental results demonstrate that our method outperforms state-of-the-art methods, achieving 85.3% Recall@1 on the MSLS val dataset and 59.1% Recall@1 on the Nordland dataset when using ResNet18 as the backbone.
What problem does this paper attempt to address?