Abstract:Globally-consistent localization in urban environments is crucial for autonomous systems such as self-driving vehicles and drones, as well as assistive technologies for visually impaired people. Traditional Visual-Inertial Odometry (VIO) and Visual Simultaneous Localization and Mapping (VSLAM) methods, though adequate for local pose estimation, suffer from drift in the long term due to reliance on local sensor data. While GPS counteracts this drift, it is unavailable indoors and often unreliable in urban areas. An alternative is to localize the camera to an existing 3D map using visual-feature matching. This can provide centimeter-level accurate localization but is limited by the visual similarities between the current view and the map. This paper introduces a novel approach that achieves accurate and globally-consistent localization by aligning the sparse 3D point cloud generated by the VIO/VSLAM system to a digital twin using point-to-plane matching; no visual data association is needed. The proposed method provides a 6-DoF global measurement tightly integrated into the VIO/VSLAM system. Experiments run on a high-fidelity GPS simulator and real-world data collected from a drone demonstrate that our approach outperforms state-of-the-art VIO-GPS systems and offers superior robustness against viewpoint changes compared to the state-of-the-art Visual SLAM systems.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: achieving drift - free and globally consistent localization in urban environments, which is crucial for autonomous systems such as self - driving vehicles, drones, and visual - impairment assistance technologies. Specifically, although traditional methods based on visual - inertial odometry (VIO) and visual simultaneous localization and mapping (VSLAM) can provide local pose estimates, long - term reliance on local sensor data will lead to the drift problem. GPS can counteract this drift, but it is often unavailable or unreliable in indoor and urban environments. To solve these problems, the authors propose a novel method. By aligning the sparse 3D point cloud generated by VIO/VSLAM with the digital twin model and using the point - to - plane matching technique, accurate and globally consistent localization can be achieved without visual data association. This method provides global measurements with 6 degrees of freedom and is tightly integrated into the VIO/VSLAM system, thereby reducing drift and improving robustness to changes in viewing angles. ### Key Problem Summary: 1. **Long - term Drift Problem**: Traditional VIO and VSLAM methods will accumulate errors during long - term operation, resulting in inaccurate localization. 2. **GPS Limitations**: GPS is unreliable in indoor and urban environments and cannot provide continuous global localization. 3. **Limitations of Visual Feature Matching**: Localization methods based on visual feature matching rely on the similarity between the current view and the map and perform poorly under different viewing angles or environmental conditions. ### Proposed Solutions: - Use the point - to - plane matching technique to align the sparse 3D point cloud generated by VIO/VSLAM with the digital twin model. - This method does not require visual data association and reduces sensitivity to changes in viewing angles. - Experimental results show that this method outperforms existing VIO - GPS systems in both high - fidelity GPS simulators and real - world data. ### Formula Representation: - The objective function of point - to - plane matching: \[ \min_{\delta R, \delta p} J=\sum_{j \in J_k}([\delta R\cdot a_j+\delta p - b_j]\cdot n_j)^2 \] where $\delta R$ and $\delta p$ are rotation and translation parameters respectively, $a_j$ is a point in the source point cloud, $b_j$ is a point in the target point cloud, and $n_j$ is the normal vector at $b_j$. - Adaptive weighting strategy: \[ W_k^m = \beta\frac{\exp\left(-\frac{\gamma^2}{2}\right)}{\sigma^2}\cdot H \] where $\gamma$ is the root - mean - square error of inliers, $\sigma^2$ is the sum of the diagonal elements of the Hessian matrix, $\beta$ is a scaling factor, and $H$ is a covariance matrix. Through these methods, the authors have successfully addressed the challenge of globally consistent localization in urban environments and demonstrated their superior performance in various environments.

Drift-free Visual SLAM using Digital Twins

Design of an Enhanced Visual Odometry by Building and Matching Compressive Panoramic Landmarks Online

Visual Localization in a Prior 3D LiDAR Map Combining Points and Lines

DVI-SLAM: A Dual Visual Inertial SLAM Network

Fusion of Monocular Vision and Radio-based Ranging for Global Scale Estimation and Drift Mitigation

Visual SLAM With Drift-Free Rotation Estimation in Manhattan World

D3VIL-SLAM: 3D Visual Inertial LiDAR SLAM for Outdoor Environments

Visual SLAM in dynamic environments based on object detection

SDVL: Efficient and Accurate Semi-Direct Visual Localization

VIR-SLAM: visual, inertial, and ranging SLAM for single and multi-robot systems

On combining visual SLAM and visual odometry

From Pixels to Precision: A Survey of Monocular Visual Odometry in Digital Twin Applications

A Novel Lidar-Assisted Monocular Visual SLAM Framework for Mobile Robots in Outdoor Environments

A photogrammetric approach for real‐time visual SLAM applied to an omnidirectional system

LDVI-SLAM: A Lightweight Monocular Visual-Inertial SLAM System for Dynamic Environments Based on Motion Constraints

PGD-VIO: An Accurate Plane-Aided Visual-Inertial Odometry with Graph-Based Drift Suppression

Global Visual-Inertial Localization for Autonomous Vehicles with Pre-Built Map.

Accurate Visual-Inertial SLAM by Feature Re-identification

Fast Direct Stereo Visual SLAM

VR-SLAM: A Visual-Range Simultaneous Localization and Mapping System using Monocular Camera and Ultra-wideband Sensors

VID-SLAM: Robust Pose Estimation with RGBD-Inertial Input for Indoor Robotic Localization