MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes

Chenyang Wu,Yifan Duan,Xinran Zhang,Yu Sheng,Jianmin Ji,Yanyong Zhang
2024-04-05
Abstract:Localization and mapping are critical tasks for various applications such as autonomous vehicles and robotics. The challenges posed by outdoor environments present particular complexities due to their unbounded characteristics. In this work, we present MM-Gaussian, a LiDAR-camera multi-modal fusion system for localization and mapping in unbounded scenes. Our approach is inspired by the recently developed 3D Gaussians, which demonstrate remarkable capabilities in achieving high rendering quality and fast rendering speed. Specifically, our system fully utilizes the geometric structure information provided by solid-state LiDAR to address the problem of inaccurate depth encountered when relying solely on visual solutions in unbounded, outdoor scenarios. Additionally, we utilize 3D Gaussian point clouds, with the assistance of pixel-level gradient descent, to fully exploit the color information in photos, thereby achieving realistic rendering effects. To further bolster the robustness of our system, we designed a relocalization module, which assists in returning to the correct trajectory in the event of a localization failure. Experiments conducted in multiple scenarios demonstrate the effectiveness of our method.
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the problem of localization and mapping in unbounded scenarios (such as outdoor environments). Specifically, the authors propose a multimodal fusion system based on 3D Gaussian point clouds—MM-Gaussian, to solve the following issues: 1. **Inaccuracy of depth in single visual solutions**: In unbounded outdoor scenes, relying solely on visual information for depth estimation can lead to inaccuracies. MM-Gaussian improves the accuracy of depth estimation by incorporating geometric structure information provided by solid-state LiDAR. 2. **Reconstruction of large-scale outdoor scenes**: Traditional SLAM methods struggle to efficiently reconstruct complex textures and details in large-scale outdoor scenes due to limitations in map representation (such as point clouds, surfels, voxels, etc.). MM-Gaussian uses a 3D Gaussian point cloud representation, enabling real-time rendering of high-quality images and supporting incremental map construction. 3. **Recovery after localization failure**: In practical applications, the complexity of the environment (such as textureless ground and walls) may lead to localization failures, affecting map construction. To address this, the authors designed a relocalization module that uses the rendering capabilities of 3D Gaussian point clouds to reposition the system on the correct trajectory, enhancing the system's robustness. In summary, the main contributions of this paper include: - Proposing a 3D Gaussian point cloud-based multi-sensor fusion SLAM method, MM-Gaussian, capable of achieving high-precision localization and map construction in unbounded outdoor scenes. - Designing a relocalization module that can recover the system's trajectory in case of localization failure, improving the system's robustness. - Experimentally validating the superior performance of this method in localization and mapping.