Abstract:Visual place recognition (VPR) is a challenging task for visual computing in the field of robot navigation. However, most of the existing methods fail to learn the most salient features of place images by simple CNN feature or popular Transformer feature due to the inconsistency problem commonly existing in VPR datasets, which limits the robustness and interpretability of the model. In addition, existing state-of-the-art methods only capture general features of original places with multi-scale CNN or transformer features and ignore texture characteristics existing in place images, resulting in suboptimal recognition performance. To cope with the above issues, we propose a novel visual place recognition network, named Texture-enhanced Cross-domain Attention Transformer (TECD_Attention). Specially, a cross-attention Transformer is first used for fusing deep attentive local and global features to improve the multi-scale feature representation of the recognition model. Second, a texture-enhanced cross-domain attention block is designed to construct the final feature descriptor by fusing texture features and attentive local–global features. Then, a tripled loss function is used for matching top-ranked reference places from the place database to a query place. Last, effective and efficient place re-ranking is achieved by training an adapted weakly supervised re-ranking network relying on the similarity computing between the query place and the top-ranked places. Our approach is carried out in extensive experiments on four challenging datasets. Our model has achieved 96.2%, 94.6%, 95.9%, and 96.8% average recall based on top 1% Candidate scenario on Tokyo 24/7, Pitts250k, VPRiCE, and SUN397 datasets, respectively. Therefore, Compared with the existing state-of-the-art VPR methods, TECD_Attention performs superior on robot place recognition in challenging environments. Hence, we can conclude that this is a robust model for robot visual place recognition in challenging environments.

Enhancing Visual Place Recognition with Multi-modal Features and Time-constrained Graph Attention Aggregation

BEV^2PR: BEV-Enhanced Visual Place Recognition with Structural Cues

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

Visual Place Recognition Based on Multilevel Descriptors for the Visually Impaired People

Explicit Feature Disentanglement for Visual Place Recognition Across Appearance Changes

A Fusion Method Aiming at Environmental Perception of Autonomous Vehicle Based on Visual Scheme

Hybrid CNN-Transformer Features for Visual Place Recognition

LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition

Semantic-focused Patch Tokenizer with Multi-branch Mixer for Visual Place Recognition

A Hierarchical Utilization of Semantic Gradients and Scene Structure for Visual Place Recognition

Learning robust representation and sequence constraint for retrieval-based long-term visual place recognition

Enhancing Visual Place Recognition Using Discrete Cosine Transform and Difference-Based Descriptors

TECD_Attention: Texture-enhanced and cross-domain attention modeling for visual place recognition

A Multi-Domain Feature Learning Method for Visual Place Recognition

Visual Place Recognition for Opposite Viewpoints and Environment Changes

Self-Supervised Visual Place Recognition by Mining Temporal and Feature Neighborhoods

Gicnet: global information capture network for visual place recognition

Register assisted aggregation for Visual Place Recognition

GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving

Convolutional MLP orthogonal fusion of multiscale features for visual place recognition