Abstract:Visual place recognition (VPR) is a challenging task for visual computing in the field of robot navigation. However, most of the existing methods fail to learn the most salient features of place images by simple CNN feature or popular Transformer feature due to the inconsistency problem commonly existing in VPR datasets, which limits the robustness and interpretability of the model. In addition, existing state-of-the-art methods only capture general features of original places with multi-scale CNN or transformer features and ignore texture characteristics existing in place images, resulting in suboptimal recognition performance. To cope with the above issues, we propose a novel visual place recognition network, named Texture-enhanced Cross-domain Attention Transformer (TECD_Attention). Specially, a cross-attention Transformer is first used for fusing deep attentive local and global features to improve the multi-scale feature representation of the recognition model. Second, a texture-enhanced cross-domain attention block is designed to construct the final feature descriptor by fusing texture features and attentive local–global features. Then, a tripled loss function is used for matching top-ranked reference places from the place database to a query place. Last, effective and efficient place re-ranking is achieved by training an adapted weakly supervised re-ranking network relying on the similarity computing between the query place and the top-ranked places. Our approach is carried out in extensive experiments on four challenging datasets. Our model has achieved 96.2%, 94.6%, 95.9%, and 96.8% average recall based on top 1% Candidate scenario on Tokyo 24/7, Pitts250k, VPRiCE, and SUN397 datasets, respectively. Therefore, Compared with the existing state-of-the-art VPR methods, TECD_Attention performs superior on robot place recognition in challenging environments. Hence, we can conclude that this is a robust model for robot visual place recognition in challenging environments.

CAHIR: Co-Attentive Hierarchical Image Representations for Visual Place Recognition.

BEV^2PR: BEV-Enhanced Visual Place Recognition with Structural Cues

Visual Place Recognition Based on Multilevel Descriptors for the Visually Impaired People

A Panoramic Localizer Based on Coarse-to-Fine Descriptors for Navigation Assistance

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Enhancing Visual Place Recognition Using Discrete Cosine Transform and Difference-Based Descriptors

Attention-Aware Age-Agnostic Visual Place Recognition

Intelligent Reference Curation for Visual Place Recognition via Bayesian Selective Fusion

TECD_Attention: Texture-enhanced and cross-domain attention modeling for visual place recognition

SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions

Salient-VPR: Salient Weighted Global Descriptor for Visual Place Recognition

Visual Place Recognition for Opposite Viewpoints and Environment Changes

Hybrid CNN-Transformer Features for Visual Place Recognition

Deep Homography Estimation for Visual Place Recognition

Gicnet: global information capture network for visual place recognition

CSPFormer: A cross-spatial pyramid transformer for visual place recognition

MultiRes-NetVLAD: Augmenting Place Recognition Training with Low-Resolution Imagery

A Coarse-to-Fine Place Recognition Approach using Attention-guided Descriptors and Overlap Estimation

Forest: A Lightweight Semantic Image Descriptor for Robust Visual Place Recognition

Local positional graphs and attentive local features for a data and runtime-efficient hierarchical place recognition pipeline