Abstract:In recent years, semantic segmentation has made significant progress in visual place recognition (VPR) by using semantic information that is relatively invariant to appearance and viewpoint, demonstrating great potential. However, in some extreme scenarios, there may be semantic occlusion and semantic sparsity, which can lead to confusion when relying solely on semantic information for localization. Therefore, this paper proposes a novel VPR framework that employs a coarse-to-fine image matching strategy, combining semantic and appearance information to improve algorithm performance. First, we construct SemLook global descriptors using semantic contours, which can preliminarily screen images to enhance the accuracy and real-time performance of the algorithm. Based on this, we introduce SemLook local descriptors for fine screening, combining robust appearance information extracted by deep learning with semantic information. These local descriptors can address issues such as semantic overlap and sparsity in urban environments, further improving the accuracy of the algorithm. Through this refined screening process, we can effectively handle the challenges of complex image matching in urban environments and obtain more accurate results. The performance of SemLook descriptors is evaluated on three public datasets (Extended-CMU Season, Robot-Car Seasons v2, and SYNTHIA) and compared with six state-of-the-art VPR algorithms (HOG, CoHOG, AlexNet_VPR, Region VLAD, Patch-NetVLAD, Forest). In the experimental comparison, considering both real-time performance and evaluation metrics, the SemLook descriptors are found to outperform the other six algorithms. Evaluation metrics include the area under the curve (AUC) based on the precision–recall curve, Recall@100%Precision, and Precision@100%Recall. On the Extended-CMU Season dataset, SemLook descriptors achieve a 100% AUC value, and on the SYNTHIA dataset, they achieve a 99% AUC value, demonstrating outstanding performance. The experimental results indicate that introducing global descriptors for initial screening and utilizing local descriptors combining both semantic and appearance information for precise matching can effectively address the issue of location recognition in scenarios with semantic ambiguity or sparsity. This algorithm enhances descriptor performance, making it more accurate and robust in scenes with variations in appearance and viewpoint.

A Training-Free, Lightweight Global Image Descriptor for Long-Term Visual Place Recognition Toward Autonomous Vehicles

A Novel Image Descriptor with Aggregated Semantic Skeleton Representation for Long-term Visual Place Recognition

Context for LiDAR-based Place Recognition

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

Visual Place Recognition Based on Multilevel Descriptors for the Visually Impaired People

A Panoramic Localizer Based on Coarse-to-Fine Descriptors for Navigation Assistance

SSC: Semantic Scan Context for Large-Scale Place Recognition

Salient-VPR: Salient Weighted Global Descriptor for Visual Place Recognition

Forest: A Lightweight Semantic Image Descriptor for Robust Visual Place Recognition

A Hierarchical Utilization of Semantic Gradients and Scene Structure for Visual Place Recognition

Visual Place Recognition for Opposite Viewpoints and Environment Changes

SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions

An Appearance-Semantic Descriptor with Coarse-to-Fine Matching for Robust VPR

Learning robust representation and sequence constraint for retrieval-based long-term visual place recognition

Context-Based Visual-Language Place Recognition

SE-VPR: Semantic Enhanced VPR Approach for Visual Localization.

Self-Supervised Place Recognition by Refining Temporal and Featural Pseudo Labels from Panoramic Data

Self-Supervised Visual Place Recognition by Mining Temporal and Feature Neighborhoods

Learning Sequence Descriptor based on Spatio-Temporal Attention for Visual Place Recognition

VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition

A Novel Place Recognition Network Using Visual Sequences and LiDAR Point Clouds for Autonomous Vehicles