Abstract:In recent years, semantic segmentation has made significant progress in visual place recognition (VPR) by using semantic information that is relatively invariant to appearance and viewpoint, demonstrating great potential. However, in some extreme scenarios, there may be semantic occlusion and semantic sparsity, which can lead to confusion when relying solely on semantic information for localization. Therefore, this paper proposes a novel VPR framework that employs a coarse-to-fine image matching strategy, combining semantic and appearance information to improve algorithm performance. First, we construct SemLook global descriptors using semantic contours, which can preliminarily screen images to enhance the accuracy and real-time performance of the algorithm. Based on this, we introduce SemLook local descriptors for fine screening, combining robust appearance information extracted by deep learning with semantic information. These local descriptors can address issues such as semantic overlap and sparsity in urban environments, further improving the accuracy of the algorithm. Through this refined screening process, we can effectively handle the challenges of complex image matching in urban environments and obtain more accurate results. The performance of SemLook descriptors is evaluated on three public datasets (Extended-CMU Season, Robot-Car Seasons v2, and SYNTHIA) and compared with six state-of-the-art VPR algorithms (HOG, CoHOG, AlexNet_VPR, Region VLAD, Patch-NetVLAD, Forest). In the experimental comparison, considering both real-time performance and evaluation metrics, the SemLook descriptors are found to outperform the other six algorithms. Evaluation metrics include the area under the curve (AUC) based on the precision–recall curve, Recall@100%Precision, and Precision@100%Recall. On the Extended-CMU Season dataset, SemLook descriptors achieve a 100% AUC value, and on the SYNTHIA dataset, they achieve a 99% AUC value, demonstrating outstanding performance. The experimental results indicate that introducing global descriptors for initial screening and utilizing local descriptors combining both semantic and appearance information for precise matching can effectively address the issue of location recognition in scenarios with semantic ambiguity or sparsity. This algorithm enhances descriptor performance, making it more accurate and robust in scenes with variations in appearance and viewpoint.

Semantic-focused Patch Tokenizer with Multi-branch Mixer for Visual Place Recognition

BEV^2PR: BEV-Enhanced Visual Place Recognition with Structural Cues

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

A Hierarchical Utilization of Semantic Gradients and Scene Structure for Visual Place Recognition

Contextual Patch-NetVLAD: Context-Aware Patch Feature Descriptor and Patch Matching Mechanism for Visual Place Recognition

DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing

LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition

A Novel Image Descriptor with Aggregated Semantic Skeleton Representation for Long-term Visual Place Recognition

MixVPR: Feature Mixing for Visual Place Recognition

Forest: A Lightweight Semantic Image Descriptor for Robust Visual Place Recognition

An Appearance-Semantic Descriptor with Coarse-to-Fine Matching for Robust VPR

SE-VPR: Semantic Enhanced VPR Approach for Visual Localization.

SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions

Self-Supervised Visual Place Recognition by Mining Temporal and Feature Neighborhoods

Salient-VPR: Salient Weighted Global Descriptor for Visual Place Recognition

DINO-Mix enhancing visual place recognition with foundational vision model and feature mixing

Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Lifelong Learning

Semantics-enhanced discriminative descriptor learning for LiDAR-based place recognition

SC_LPR: Semantically Consistent LiDAR Place Recognition Based on Chained Cascade Network in Long-Term Dynamic Environments

Self-Supervised Place Recognition by Refining Temporal and Featural Pseudo Labels from Panoramic Data