Abstract:In recent years, semantic segmentation has made significant progress in visual place recognition (VPR) by using semantic information that is relatively invariant to appearance and viewpoint, demonstrating great potential. However, in some extreme scenarios, there may be semantic occlusion and semantic sparsity, which can lead to confusion when relying solely on semantic information for localization. Therefore, this paper proposes a novel VPR framework that employs a coarse-to-fine image matching strategy, combining semantic and appearance information to improve algorithm performance. First, we construct SemLook global descriptors using semantic contours, which can preliminarily screen images to enhance the accuracy and real-time performance of the algorithm. Based on this, we introduce SemLook local descriptors for fine screening, combining robust appearance information extracted by deep learning with semantic information. These local descriptors can address issues such as semantic overlap and sparsity in urban environments, further improving the accuracy of the algorithm. Through this refined screening process, we can effectively handle the challenges of complex image matching in urban environments and obtain more accurate results. The performance of SemLook descriptors is evaluated on three public datasets (Extended-CMU Season, Robot-Car Seasons v2, and SYNTHIA) and compared with six state-of-the-art VPR algorithms (HOG, CoHOG, AlexNet_VPR, Region VLAD, Patch-NetVLAD, Forest). In the experimental comparison, considering both real-time performance and evaluation metrics, the SemLook descriptors are found to outperform the other six algorithms. Evaluation metrics include the area under the curve (AUC) based on the precision–recall curve, Recall@100%Precision, and Precision@100%Recall. On the Extended-CMU Season dataset, SemLook descriptors achieve a 100% AUC value, and on the SYNTHIA dataset, they achieve a 99% AUC value, demonstrating outstanding performance. The experimental results indicate that introducing global descriptors for initial screening and utilizing local descriptors combining both semantic and appearance information for precise matching can effectively address the issue of location recognition in scenarios with semantic ambiguity or sparsity. This algorithm enhances descriptor performance, making it more accurate and robust in scenes with variations in appearance and viewpoint.

A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition

Visual Place Recognition Based on Multilevel Descriptors for the Visually Impaired People

Explicit Feature Disentanglement for Visual Place Recognition Across Appearance Changes

A Panoramic Localizer Based on Coarse-to-Fine Descriptors for Navigation Assistance

Local positional graphs and attentive local features for a data and runtime-efficient hierarchical place recognition pipeline

Fast, Compact and Highly Scalable Visual Place Recognition through Sequence-based Matching of Overloaded Representations

MultiRes-NetVLAD: Augmenting Place Recognition Training with Low-Resolution Imagery

AnyLoc: Towards Universal Visual Place Recognition

Enhancing Visual Place Recognition Using Discrete Cosine Transform and Difference-Based Descriptors

Robust Visual Place Recognition for Severe Appearance Changes

A Novel Image Descriptor with Aggregated Semantic Skeleton Representation for Long-term Visual Place Recognition

HVP-Net: A Hybrid Voxel- and Point-Wise Network for Place Recognition

Salient-VPR: Salient Weighted Global Descriptor for Visual Place Recognition

Deep Homography Estimation for Visual Place Recognition

Convolutional MLP orthogonal fusion of multiscale features for visual place recognition

An Appearance-Semantic Descriptor with Coarse-to-Fine Matching for Robust VPR

VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition

Visual Place Recognition for Opposite Viewpoints and Environment Changes

Coarse-to-Fine Visual Place Recognition

A Hierarchical Utilization of Semantic Gradients and Scene Structure for Visual Place Recognition

Deja vu: Scalable Place Recognition Using Mutually Supportive Feature Frequencies