Abstract:Visual place recognition (VPR) is a challenging task for visual computing in the field of robot navigation. However, most of the existing methods fail to learn the most salient features of place images by simple CNN feature or popular Transformer feature due to the inconsistency problem commonly existing in VPR datasets, which limits the robustness and interpretability of the model. In addition, existing state-of-the-art methods only capture general features of original places with multi-scale CNN or transformer features and ignore texture characteristics existing in place images, resulting in suboptimal recognition performance. To cope with the above issues, we propose a novel visual place recognition network, named Texture-enhanced Cross-domain Attention Transformer (TECD_Attention). Specially, a cross-attention Transformer is first used for fusing deep attentive local and global features to improve the multi-scale feature representation of the recognition model. Second, a texture-enhanced cross-domain attention block is designed to construct the final feature descriptor by fusing texture features and attentive local–global features. Then, a tripled loss function is used for matching top-ranked reference places from the place database to a query place. Last, effective and efficient place re-ranking is achieved by training an adapted weakly supervised re-ranking network relying on the similarity computing between the query place and the top-ranked places. Our approach is carried out in extensive experiments on four challenging datasets. Our model has achieved 96.2%, 94.6%, 95.9%, and 96.8% average recall based on top 1% Candidate scenario on Tokyo 24/7, Pitts250k, VPRiCE, and SUN397 datasets, respectively. Therefore, Compared with the existing state-of-the-art VPR methods, TECD_Attention performs superior on robot place recognition in challenging environments. Hence, we can conclude that this is a robust model for robot visual place recognition in challenging environments.

Self-Supervised Place Recognition by Refining Temporal and Featural Pseudo Labels from Panoramic Data

Self-Supervised Visual Place Recognition by Mining Temporal and Feature Neighborhoods

BEV^2PR: BEV-Enhanced Visual Place Recognition with Structural Cues

A Panoramic Localizer Based on Coarse-to-Fine Descriptors for Navigation Assistance

Visual Place Recognition Based on Multilevel Descriptors for the Visually Impaired People

STA-VPR: Spatio-temporal Alignment for Visual Place Recognition

MVC-VPR: Mutual Learning of Viewpoint Classification and Visual Place Recognition

Learning Sequence Descriptor based on Spatio-Temporal Attention for Visual Place Recognition

A Multi-Domain Feature Learning Method for Visual Place Recognition

Salient-VPR: Salient Weighted Global Descriptor for Visual Place Recognition

Enhancing Visual Place Recognition Using Discrete Cosine Transform and Difference-Based Descriptors

PanoVPR: Towards Unified Perspective-to-Equirectangular Visual Place Recognition via Sliding Windows across the Panoramic View

Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

Deep Homography Estimation for Visual Place Recognition

A Novel Image Descriptor with Aggregated Semantic Skeleton Representation for Long-term Visual Place Recognition

Visual Place Recognition for Opposite Viewpoints and Environment Changes

A Training-Free, Lightweight Global Image Descriptor for Long-Term Visual Place Recognition Toward Autonomous Vehicles

TECD_Attention: Texture-enhanced and cross-domain attention modeling for visual place recognition

A Hierarchical Utilization of Semantic Gradients and Scene Structure for Visual Place Recognition

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Self-Supervised Domain Calibration and Uncertainty Estimation for Place Recognition