Abstract:Recently, the multi-modal fusion of RGB, depth, and semantics has shown great potential in the domain of dense Simultaneous Localization and Mapping (SLAM), as known as dense semantic SLAM. Yet a prerequisite for generating consistent and continuous semantic maps is the availability of dense, efficient, and scalable scene representations. To date, existing semantic SLAM systems based on explicit scene representations (points/meshes/surfels) are limited by their resolutions and inabilities to predict unknown areas, thus failing to generate dense maps. Contrarily, a few implicit scene representations (Neural Radiance Fields) to deal with these problems rely on time-consuming ray tracing-based volume rendering technique, which cannot meet the real-time rendering requirements of SLAM. Fortunately, the Gaussian Splatting scene representation has recently emerged, which inherits the efficiency and scalability of point/surfel representations while smoothly represents geometric structures in a continuous manner, showing promise in addressing the aforementioned challenges. To this end, we propose GS3LAM, a Gaussian Semantic Splatting SLAM framework, which takes multimodal data as input and can render consistent, continuous dense semantic maps in real-time. To fuse multimodal data, GS3LAM models the scene as a Semantic Gaussian Field (SG-Field), and jointly optimizes camera poses and the field by establishing error constraints between observed and predicted data. Furthermore, a Depth-adaptive Scale Regularization (DSR) scheme is proposed to tackle the problem of misalignment between scale-invariant Gaussians and geometric surfaces within the SG-Field. To mitigate the forgetting phenomenon, we propose an effective Random Sampling-based Keyframe Mapping (RSKM) strategy, which exhibits notable superiority over local covisibility optimization strategies commonly utilized in 3DGS-based SLAM systems. Extensive experiments conducted on the benchmark datasets reveal that compared with state-of-the-art competitors, GS3 LAM demonstrates increased tracking robustness, superior real-time rendering quality, and enhanced semantic reconstruction precision. To make the results reproducible, the source code is available at https://github.com/lif314/GS3LAM.

KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences

EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy

GS3LAM: Gaussian Semantic Splatting SLAM

SfM-Free 3D Gaussian Splatting via Hierarchical Training

Towards Real-Time Gaussian Splatting: Accelerating 3DGS through Photometric SLAM

SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting

HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting

Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction

Gaussian Splatting SLAM

Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion

EfficientGS: Streamlining Gaussian Splatting for Large-Scale High-Resolution Scene Representation

SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion

ZeroGS: Training 3D Gaussian Splatting from Unposed Images

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos

3D-HGS: 3D Half-Gaussian Splatting

MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives

MotionGS : Compact Gaussian Splatting SLAM by Motion Filter

Deblur-GS: 3D Gaussian Splatting from Camera Motion Blurred Images