Drift-free Visual SLAM using Digital Twins

Roxane Merat,Giovanni Cioffi,Leonard Bauersfeld,Davide Scaramuzza
2024-12-11
Abstract:Globally-consistent localization in urban environments is crucial for autonomous systems such as self-driving vehicles and drones, as well as assistive technologies for visually impaired people. Traditional Visual-Inertial Odometry (VIO) and Visual Simultaneous Localization and Mapping (VSLAM) methods, though adequate for local pose estimation, suffer from drift in the long term due to reliance on local sensor data. While GPS counteracts this drift, it is unavailable indoors and often unreliable in urban areas. An alternative is to localize the camera to an existing 3D map using visual-feature matching. This can provide centimeter-level accurate localization but is limited by the visual similarities between the current view and the map. This paper introduces a novel approach that achieves accurate and globally-consistent localization by aligning the sparse 3D point cloud generated by the VIO/VSLAM system to a digital twin using point-to-plane matching; no visual data association is needed. The proposed method provides a 6-DoF global measurement tightly integrated into the VIO/VSLAM system. Experiments run on a high-fidelity GPS simulator and real-world data collected from a drone demonstrate that our approach outperforms state-of-the-art VIO-GPS systems and offers superior robustness against viewpoint changes compared to the state-of-the-art Visual SLAM systems.
Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: achieving drift - free and globally consistent localization in urban environments, which is crucial for autonomous systems such as self - driving vehicles, drones, and visual - impairment assistance technologies. Specifically, although traditional methods based on visual - inertial odometry (VIO) and visual simultaneous localization and mapping (VSLAM) can provide local pose estimates, long - term reliance on local sensor data will lead to the drift problem. GPS can counteract this drift, but it is often unavailable or unreliable in indoor and urban environments. To solve these problems, the authors propose a novel method. By aligning the sparse 3D point cloud generated by VIO/VSLAM with the digital twin model and using the point - to - plane matching technique, accurate and globally consistent localization can be achieved without visual data association. This method provides global measurements with 6 degrees of freedom and is tightly integrated into the VIO/VSLAM system, thereby reducing drift and improving robustness to changes in viewing angles. ### Key Problem Summary: 1. **Long - term Drift Problem**: Traditional VIO and VSLAM methods will accumulate errors during long - term operation, resulting in inaccurate localization. 2. **GPS Limitations**: GPS is unreliable in indoor and urban environments and cannot provide continuous global localization. 3. **Limitations of Visual Feature Matching**: Localization methods based on visual feature matching rely on the similarity between the current view and the map and perform poorly under different viewing angles or environmental conditions. ### Proposed Solutions: - Use the point - to - plane matching technique to align the sparse 3D point cloud generated by VIO/VSLAM with the digital twin model. - This method does not require visual data association and reduces sensitivity to changes in viewing angles. - Experimental results show that this method outperforms existing VIO - GPS systems in both high - fidelity GPS simulators and real - world data. ### Formula Representation: - The objective function of point - to - plane matching: \[ \min_{\delta R, \delta p} J=\sum_{j \in J_k}([\delta R\cdot a_j+\delta p - b_j]\cdot n_j)^2 \] where $\delta R$ and $\delta p$ are rotation and translation parameters respectively, $a_j$ is a point in the source point cloud, $b_j$ is a point in the target point cloud, and $n_j$ is the normal vector at $b_j$. - Adaptive weighting strategy: \[ W_k^m = \beta\frac{\exp\left(-\frac{\gamma^2}{2}\right)}{\sigma^2}\cdot H \] where $\gamma$ is the root - mean - square error of inliers, $\sigma^2$ is the sum of the diagonal elements of the Hessian matrix, $\beta$ is a scaling factor, and $H$ is a covariance matrix. Through these methods, the authors have successfully addressed the challenge of globally consistent localization in urban environments and demonstrated their superior performance in various environments.