Abstract:In recent years, semantic segmentation has made significant progress in visual place recognition (VPR) by using semantic information that is relatively invariant to appearance and viewpoint, demonstrating great potential. However, in some extreme scenarios, there may be semantic occlusion and semantic sparsity, which can lead to confusion when relying solely on semantic information for localization. Therefore, this paper proposes a novel VPR framework that employs a coarse-to-fine image matching strategy, combining semantic and appearance information to improve algorithm performance. First, we construct SemLook global descriptors using semantic contours, which can preliminarily screen images to enhance the accuracy and real-time performance of the algorithm. Based on this, we introduce SemLook local descriptors for fine screening, combining robust appearance information extracted by deep learning with semantic information. These local descriptors can address issues such as semantic overlap and sparsity in urban environments, further improving the accuracy of the algorithm. Through this refined screening process, we can effectively handle the challenges of complex image matching in urban environments and obtain more accurate results. The performance of SemLook descriptors is evaluated on three public datasets (Extended-CMU Season, Robot-Car Seasons v2, and SYNTHIA) and compared with six state-of-the-art VPR algorithms (HOG, CoHOG, AlexNet_VPR, Region VLAD, Patch-NetVLAD, Forest). In the experimental comparison, considering both real-time performance and evaluation metrics, the SemLook descriptors are found to outperform the other six algorithms. Evaluation metrics include the area under the curve (AUC) based on the precision–recall curve, Recall@100%Precision, and Precision@100%Recall. On the Extended-CMU Season dataset, SemLook descriptors achieve a 100% AUC value, and on the SYNTHIA dataset, they achieve a 99% AUC value, demonstrating outstanding performance. The experimental results indicate that introducing global descriptors for initial screening and utilizing local descriptors combining both semantic and appearance information for precise matching can effectively address the issue of location recognition in scenarios with semantic ambiguity or sparsity. This algorithm enhances descriptor performance, making it more accurate and robust in scenes with variations in appearance and viewpoint.

A Hierarchical Utilization of Semantic Gradients and Scene Structure for Visual Place Recognition

BEV^2PR: BEV-Enhanced Visual Place Recognition with Structural Cues

Visual Place Recognition Based on Multilevel Descriptors for the Visually Impaired People

Semantic Graph Based Place Recognition for 3D Point Clouds.

SE-VPR: Semantic Enhanced VPR Approach for Visual Localization.

A Novel Image Descriptor with Aggregated Semantic Skeleton Representation for Long-term Visual Place Recognition

SSC: Semantic Scan Context for Large-Scale Place Recognition

SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

Salient-VPR: Salient Weighted Global Descriptor for Visual Place Recognition

An Appearance-Semantic Descriptor with Coarse-to-Fine Matching for Robust VPR

Semantic-focused Patch Tokenizer with Multi-branch Mixer for Visual Place Recognition

A Training-Free, Lightweight Global Image Descriptor for Long-Term Visual Place Recognition Toward Autonomous Vehicles

GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving

Context-Based Visual-Language Place Recognition

Visual Place Recognition for Opposite Viewpoints and Environment Changes

Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

Self-Supervised Visual Place Recognition by Mining Temporal and Feature Neighborhoods

Forest: A Lightweight Semantic Image Descriptor for Robust Visual Place Recognition

Simple and Effective Visual Place Recognition Via Spiking Neural Networks and Deep Information

Structured Pruning for Efficient Visual Place Recognition