Abstract:Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map. Project Webpage:

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use multi - modal sensor data (vision, depth, and inertial measurement) to achieve efficient, real - time, and high - quality 3D scene reconstruction and camera pose tracking in the Simultaneous Localization and Mapping (SLAM) task. Specifically, the paper proposes a new method named MM3DGS, aiming to improve the existing SLAM techniques through the following aspects: 1. **Multi - modal data fusion**: Combine the data of monocular or RGB - D camera images and Inertial Measurement Unit (IMU) to improve the robustness and accuracy of the SLAM system. This solves the limitations that may be encountered when only using visual or depth information, such as motion blur, exposure change, and long - distance depth estimation error. 2. **3D Gaussian representation**: Use 3D Gaussian distribution to represent the scene. This method not only supports real - time rendering but also can provide more accurate scale perception and trajectory alignment ability. Compared with the traditional sparse point cloud or Neural Radiance Field (NeRF) representation, the 3D Gaussian representation significantly reduces the computational cost while maintaining high visual quality. 3. **Real - time performance**: By optimizing key - frame selection, Gaussian initialization, and mapping process, ensure that the system can achieve real - time processing on consumer - level hardware. This is crucial for application scenarios that require a quick response, such as drone mapping, augmented reality, and autonomous mobile robots. 4. **Dataset release**: To verify the effectiveness of the proposed method, the author also releases a new multi - modal dataset (UT - MM), which contains RGB images, depth maps, IMU measurements, and LiDAR point cloud data of multiple indoor scenes collected from mobile robots. This dataset provides a valuable resource for subsequent research. In summary, the main goal of this paper is to develop an efficient, real - time, and high - quality SLAM solution by introducing multi - modal sensor data and 3D Gaussian representation, thereby enhancing the performance of existing techniques in complex environments.

MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting

Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

NGM-SLAM: Gaussian Splatting SLAM with Radiance Field Submap

Gaussian Splatting SLAM

LVI-GS: Tightly-coupled LiDAR-Visual-Inertial SLAM using 3D Gaussian Splatting

Gaussian-LIC: Real-Time Photo-Realistic SLAM with Gaussian Splatting and LiDAR-Inertial-Camera Fusion

MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization

RGBD GS-ICP SLAM

IG-SLAM: Instant Gaussian SLAM

MotionGS : Compact Gaussian Splatting SLAM by Motion Filter

Towards Real-Time Gaussian Splatting: Accelerating 3DGS through Photometric SLAM

High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization

CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes

PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM