Abstract:Recently, the multi-modal fusion of RGB, depth, and semantics has shown great potential in the domain of dense Simultaneous Localization and Mapping (SLAM), as known as dense semantic SLAM. Yet a prerequisite for generating consistent and continuous semantic maps is the availability of dense, efficient, and scalable scene representations. To date, existing semantic SLAM systems based on explicit scene representations (points/meshes/surfels) are limited by their resolutions and inabilities to predict unknown areas, thus failing to generate dense maps. Contrarily, a few implicit scene representations (Neural Radiance Fields) to deal with these problems rely on time-consuming ray tracing-based volume rendering technique, which cannot meet the real-time rendering requirements of SLAM. Fortunately, the Gaussian Splatting scene representation has recently emerged, which inherits the efficiency and scalability of point/surfel representations while smoothly represents geometric structures in a continuous manner, showing promise in addressing the aforementioned challenges. To this end, we propose GS3LAM, a Gaussian Semantic Splatting SLAM framework, which takes multimodal data as input and can render consistent, continuous dense semantic maps in real-time. To fuse multimodal data, GS3LAM models the scene as a Semantic Gaussian Field (SG-Field), and jointly optimizes camera poses and the field by establishing error constraints between observed and predicted data. Furthermore, a Depth-adaptive Scale Regularization (DSR) scheme is proposed to tackle the problem of misalignment between scale-invariant Gaussians and geometric surfaces within the SG-Field. To mitigate the forgetting phenomenon, we propose an effective Random Sampling-based Keyframe Mapping (RSKM) strategy, which exhibits notable superiority over local covisibility optimization strategies commonly utilized in 3DGS-based SLAM systems. Extensive experiments conducted on the benchmark datasets reveal that compared with state-of-the-art competitors, GS3 LAM demonstrates increased tracking robustness, superior real-time rendering quality, and enhanced semantic reconstruction precision. To make the results reproducible, the source code is available at https://github.com/lif314/GS3LAM.

Reinforcement Learning with Generalizable Gaussian Splatting

Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

GS3LAM: Gaussian Semantic Splatting SLAM

RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning

SRGS: Super-Resolution 3D Gaussian Splatting

Focus On What Matters: Separated Models For Visual-Based RL Generalization

GGRt: Towards Generalizable 3D Gaussians Without Pose Priors in Real-Time

Unbounded-GS: Extending 3D Gaussian Splatting with Hybrid Representation for Unbounded Large-Scale Scene Reconstruction

GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs

Subequivariant Graph Reinforcement Learning in 3D Environments

Learning Task-relevant Representations for Generalization Via Characteristic Functions of Reward Sequence Distributions

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

Visual Grounding for Object-Level Generalization in Reinforcement Learning

RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

Robotic Learning in your Backyard: A Neural Simulator from Open Source Components

GraspSplats: Efficient Manipulation with 3D Feature Splatting

LI-GS: Gaussian Splatting with LiDAR Incorporated for Accurate Large-Scale Reconstruction

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers