Abstract:Visual place recognition (VPR) is a challenging task for visual computing in the field of robot navigation. However, most of the existing methods fail to learn the most salient features of place images by simple CNN feature or popular Transformer feature due to the inconsistency problem commonly existing in VPR datasets, which limits the robustness and interpretability of the model. In addition, existing state-of-the-art methods only capture general features of original places with multi-scale CNN or transformer features and ignore texture characteristics existing in place images, resulting in suboptimal recognition performance. To cope with the above issues, we propose a novel visual place recognition network, named Texture-enhanced Cross-domain Attention Transformer (TECD_Attention). Specially, a cross-attention Transformer is first used for fusing deep attentive local and global features to improve the multi-scale feature representation of the recognition model. Second, a texture-enhanced cross-domain attention block is designed to construct the final feature descriptor by fusing texture features and attentive local–global features. Then, a tripled loss function is used for matching top-ranked reference places from the place database to a query place. Last, effective and efficient place re-ranking is achieved by training an adapted weakly supervised re-ranking network relying on the similarity computing between the query place and the top-ranked places. Our approach is carried out in extensive experiments on four challenging datasets. Our model has achieved 96.2%, 94.6%, 95.9%, and 96.8% average recall based on top 1% Candidate scenario on Tokyo 24/7, Pitts250k, VPRiCE, and SUN397 datasets, respectively. Therefore, Compared with the existing state-of-the-art VPR methods, TECD_Attention performs superior on robot place recognition in challenging environments. Hence, we can conclude that this is a robust model for robot visual place recognition in challenging environments.

Enhancing Visual Place Recognition Using Discrete Cosine Transform and Difference-Based Descriptors

Visual Place Recognition Based on Multilevel Descriptors for the Visually Impaired People

BEV^2PR: BEV-Enhanced Visual Place Recognition with Structural Cues

Explicit Feature Disentanglement for Visual Place Recognition Across Appearance Changes

Adversarial Feature Disentanglement for Place Recognition Across Changing Appearance.

A Panoramic Localizer Based on Coarse-to-Fine Descriptors for Navigation Assistance

Hybrid CNN-Transformer Features for Visual Place Recognition

Salient-VPR: Salient Weighted Global Descriptor for Visual Place Recognition

A Multi-Domain Feature Learning Method for Visual Place Recognition

STA-VPR: Spatio-temporal Alignment for Visual Place Recognition

Self-Supervised Visual Place Recognition by Mining Temporal and Feature Neighborhoods

Dynamic Time Warping of Deep Features for Place Recognition in Visually Varying Conditions

TECD_Attention: Texture-enhanced and cross-domain attention modeling for visual place recognition

Visual Place Recognition for Opposite Viewpoints and Environment Changes

DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing

Contextual Patch-NetVLAD: Context-Aware Patch Feature Descriptor and Patch Matching Mechanism for Visual Place Recognition

Forest: A Lightweight Semantic Image Descriptor for Robust Visual Place Recognition

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

DINO-Mix enhancing visual place recognition with foundational vision model and feature mixing