Abstract:Robust localization is the cornerstone of autonomous driving, especially in challenging urban environments where GPS signals suffer from multipath errors. Traditional localization approaches rely on high-definition (HD) maps, which consist of precisely annotated landmarks. However, building HD map is expensive and challenging to scale up. Given these limitations, leveraging navigation maps has emerged as a promising low-cost alternative for localization. Current approaches based on navigation maps can achieve highly accurate localization, but their complex matching strategies lead to unacceptable inference latency that fails to meet the real-time demands. To address these limitations, we propose a novel transformer-based neural re-localization method. Inspired by image registration, our approach performs a coarse-to-fine neural feature registration between navigation map and visual bird's-eye view features. Our method significantly outperforms the current state-of-the-art OrienterNet on both the nuScenes and Argoverse datasets, which is nearly 10%/20% localization accuracy and 30/16 FPS improvement on single-view and surround-view input settings, separately. We highlight that our research presents an HD-map-free localization method for autonomous driving, offering cost-effective, reliable, and scalable performance in challenging driving environments.

What problem does this paper attempt to address?

### Problems the paper attempts to solve The paper "MapLocNet: Coarse - to - Fine Feature Registration for Visual Re - Localization in Navigation Maps" aims to solve the robust localization problem in autonomous driving, especially when GPS signals are affected by multipath errors in urban environments. Traditional localization methods rely on high - definition (HD) maps, which contain precisely annotated landmarks, but the cost of constructing and maintaining these maps is high and it is difficult to scale them up on a large scale. Therefore, using navigation maps as a low - cost alternative for localization has become a promising approach. However, although existing navigation - map - based localization methods can achieve high - precision localization, their complex matching strategies lead to unacceptable inference delays and cannot meet real - time requirements. For this reason, the authors propose a new Transformer - based neural re - localization method - MapLocNet. This method aligns the navigation map with the visual bird - eye - view features through coarse - to - fine neural feature registration, thereby significantly improving the inference speed while ensuring high precision. Specifically, the main contributions of MapLocNet are as follows: 1. **Propose MapLocNet**: By fusing surround - view images and navigation maps, high - precision localization is achieved, especially suitable for areas with poor GPS signals, solving the significant position drift problem. 2. **Introduce a hierarchical coarse - to - fine feature registration strategy**: Effectively align bird - eye - view (BEV) features and map features, achieving significant improvements in both localization accuracy and inference speed compared to existing methods. 3. **Develop new training criteria**: Use perception tasks as auxiliary targets for pose prediction, making MapLocNet achieve the state - of - the - art localization accuracy on the nuScenes and Argoverse datasets. Overall, this research provides a reliable, efficient, and scalable localization method without the need for high - definition maps, which is suitable for complex driving environments.

MapLocNet: Coarse-to-Fine Feature Registration for Visual Re-Localization in Navigation Maps

Leveraging Local Planar Motion Property for Robust Visual Matching and Localization.

Long-Term Map-Based Visual Localization: Analysis of Individual Components of a Hierarchical Pipeline

Visual Localizer: Outdoor Localization Based on ConvNet Descriptor and Global Optimization for Visually Impaired Pedestrians

Multimodal Localization: Stereo over LiDAR Map

Geo-Localization with Transformer-Based 2D-3D Match Network

3D LiDAR-Based Global Localization Using Siamese Neural Network

LocNet: Global Localization in 3D Point Clouds for Mobile Robots.

A Panoramic Localizer Based on Coarse-to-Fine Descriptors for Navigation Assistance

LESS-Map: Lightweight and Evolving Semantic Map in Parking Lots for Long-term Self-Localization

MoviNet: A novel network for cross-modal map extraction by vision transformer and CNN

Persistent Stereo Visual Localization on Cross-Modal Invariant Map

Crossview Mapping with Graph-based Geolocalization on City-Scale Street Maps

Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations

Visual Localization in a Prior 3D LiDAR Map Combining Points and Lines

OrienterNet: Visual Localization in 2D Public Maps with Neural Matching

Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge

High Precision Vehicle Localization Based on Tightly-coupled Visual Odometry and Vector HD Map

Monocular Localization with Semantics Map for Autonomous Vehicles

Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image

Learning Visual Semantic Map-Matching for Loosely Multi-Sensor Fusion Localization of Autonomous Vehicles