Abstract:Extracting robust and discriminative local features from images plays a vital role for long term visual localization, whose challenges are mainly caused by the severe appearance differences between matching images due to the day-night illuminations, seasonal changes, and human activities. Existing solutions resort to jointly learning both keypoints and their descriptors in an end-to-end manner, leveraged on large number of annotations of point correspondence which are harvested from the structure from motion and depth estimation algorithms. While these methods show improved performance over non-deep methods or those two-stage deep methods, i.e., detection and then description, they are still struggled to conquer the problems encountered in long term visual localization. Since the intrinsic semantics are invariant to the local appearance changes, this paper proposes to learn semantic-aware local features in order to improve robustness of local feature matching for long term localization. Based on a state of the art CNN architecture for local feature learning, i.e., ASLFeat, this paper leverages on the semantic information from an off-the-shelf semantic segmentation network to learn semantic-aware feature maps. The learned correspondence-aware feature descriptors and semantic features are then merged to form the final feature descriptors, for which the improved feature matching ability has been observed in experiments. In addition, the learned semantics embedded in the features can be further used to filter out noisy keypoints, leading to additional accuracy improvement and faster matching speed. Experiments on two popular long term visual localization benchmarks (Aachen Day and Night v1.1, Robotcar Seasons) and one challenging indoor benchmark (InLoc) demonstrate encouraging improvements of the localization accuracy over its counterpart and other competitive methods.

Domain-Invariant Similarity Activation Map Contrastive Learning for Retrieval-Based Long-Term Visual Localization

Persistent Stereo Visual Localization on Cross-Modal Invariant Map

Long-Term Map-Based Visual Localization: Analysis of Individual Components of a Hierarchical Pipeline

Leveraging Local Planar Motion Property for Robust Visual Matching and Localization.

Communication Constrained Cloud-Based Long-Term Visual Localization in Real Time.

Laser Map Aided Visual Inertial Localization in Changing Environment.

2-Entity Random Sample Consensus for Robust Visual Localization: Framework, Methods, and Verifications

Visual Localizer: Outdoor Localization Based on ConvNet Descriptor and Global Optimization for Visually Impaired Pedestrians

2-Entity RANSAC for Robust Visual Localization in Changing Environment

Visual-Inertial Localization And Map Summarization Based On Prior Map

From Satellite to Ground: Satellite Assisted Visual Localization with Cross-view Semantic Matching

Learning Visual Semantic Map-Matching for Loosely Multi-Sensor Fusion Localization of Autonomous Vehicles

Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image

Learning a Robust Hybrid Descriptor for Robot Visual Localization

Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching

CVLNet: Cross-View Semantic Correspondence Learning for Video-based Camera Localization

Learning robust representation and sequence constraint for retrieval-based long-term visual place recognition

Scene Retrieval for Contextual Visual Mapping

CyberLoc: Towards Accurate Long-term Visual Localization

Robust Visual Localization Across Seasons

Learning Semantic-Aware Local Features for Long Term Visual Localization