Abstract:Extracting robust and discriminative local features from images plays a vital role for long term visual localization, whose challenges are mainly caused by the severe appearance differences between matching images due to the day-night illuminations, seasonal changes, and human activities. Existing solutions resort to jointly learning both keypoints and their descriptors in an end-to-end manner, leveraged on large number of annotations of point correspondence which are harvested from the structure from motion and depth estimation algorithms. While these methods show improved performance over non-deep methods or those two-stage deep methods, i.e., detection and then description, they are still struggled to conquer the problems encountered in long term visual localization. Since the intrinsic semantics are invariant to the local appearance changes, this paper proposes to learn semantic-aware local features in order to improve robustness of local feature matching for long term localization. Based on a state of the art CNN architecture for local feature learning, i.e., ASLFeat, this paper leverages on the semantic information from an off-the-shelf semantic segmentation network to learn semantic-aware feature maps. The learned correspondence-aware feature descriptors and semantic features are then merged to form the final feature descriptors, for which the improved feature matching ability has been observed in experiments. In addition, the learned semantics embedded in the features can be further used to filter out noisy keypoints, leading to additional accuracy improvement and faster matching speed. Experiments on two popular long term visual localization benchmarks (Aachen Day and Night v1.1, Robotcar Seasons) and one challenging indoor benchmark (InLoc) demonstrate encouraging improvements of the localization accuracy over its counterpart and other competitive methods.

Double-domain Adaptation Semantics for Retrieval-based Long-term Visual Localization

When Masked Image Modeling Meets Source-free Unsupervised Domain Adaptation: Dual-Level Masked Network for Semantic Segmentation

DASGIL: Domain Adaptation for Semantic and Geometric-Aware Image-Based Localization.

ADeLA: Automatic Dense Labeling with Attention for Viewpoint Shift in Semantic Segmentation

Domain-invariant Similarity Activation Map Metric Learning for Retrieval-based Long-term Visual Localization.

Domain-Invariant Similarity Activation Map Contrastive Learning for Retrieval-Based Long-Term Visual Localization

Domain Adaptation for Remote Sensing Image Semantic Segmentation: An Integrated Approach of Contrastive Learning and Adversarial Learning

Learning robust representation and sequence constraint for retrieval-based long-term visual place recognition

Learning Semantic-Aware Local Features for Long Term Visual Localization

Dual Path Learning for Domain Adaptation of Semantic Segmentation.

Domain Alignment with Large Vision-language Models for Cross-domain Remote Sensing Image Retrieval

Cross-Modality Domain Adaptation for Freespace Detection: A Simple Yet Effective Baseline

DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

Threshold-adaptive Unsupervised Focal Loss for Domain Adaptation of Semantic Segmentation

Unsupervised Domain Adaptation for Referring Semantic Segmentation

Semantic-conditioned Dual Adaptation for Cross-domain Query-based Visual Segmentation

Domain Adaptation for Semantic Segmentation of Road Scenes Via Two-Stage Alignment of Traffic Elements

A Study on Unsupervised Domain Adaptation for Semantic Segmentation in the Era of Vision-Language Models

Fully Convolutional Adaptation Networks for Semantic Segmentation

Unsupervised Domain Adaptation Multi-Level Adversarial Network for Semantic Segmentation Based on Multi-Modal Features

A Fine-Grained Unsupervised Domain Adaptation Framework for Semantic Segmentation of Remote Sensing Images