Abstract:Recently, the multi-modal fusion of RGB, depth, and semantics has shown great potential in the domain of dense Simultaneous Localization and Mapping (SLAM), as known as dense semantic SLAM. Yet a prerequisite for generating consistent and continuous semantic maps is the availability of dense, efficient, and scalable scene representations. To date, existing semantic SLAM systems based on explicit scene representations (points/meshes/surfels) are limited by their resolutions and inabilities to predict unknown areas, thus failing to generate dense maps. Contrarily, a few implicit scene representations (Neural Radiance Fields) to deal with these problems rely on time-consuming ray tracing-based volume rendering technique, which cannot meet the real-time rendering requirements of SLAM. Fortunately, the Gaussian Splatting scene representation has recently emerged, which inherits the efficiency and scalability of point/surfel representations while smoothly represents geometric structures in a continuous manner, showing promise in addressing the aforementioned challenges. To this end, we propose GS3LAM, a Gaussian Semantic Splatting SLAM framework, which takes multimodal data as input and can render consistent, continuous dense semantic maps in real-time. To fuse multimodal data, GS3LAM models the scene as a Semantic Gaussian Field (SG-Field), and jointly optimizes camera poses and the field by establishing error constraints between observed and predicted data. Furthermore, a Depth-adaptive Scale Regularization (DSR) scheme is proposed to tackle the problem of misalignment between scale-invariant Gaussians and geometric surfaces within the SG-Field. To mitigate the forgetting phenomenon, we propose an effective Random Sampling-based Keyframe Mapping (RSKM) strategy, which exhibits notable superiority over local covisibility optimization strategies commonly utilized in 3DGS-based SLAM systems. Extensive experiments conducted on the benchmark datasets reveal that compared with state-of-the-art competitors, GS3 LAM demonstrates increased tracking robustness, superior real-time rendering quality, and enhanced semantic reconstruction precision. To make the results reproducible, the source code is available at https://github.com/lif314/GS3LAM.

PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM

GS3LAM: Gaussian Semantic Splatting SLAM

PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video

Panoptic-SLAM: Visual SLAM in Dynamic Environments using Panoptic Segmentation

Panoramic Visual SLAM Technology for Spherical Images.

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

RGBDS-SLAM: A RGB-D Semantic Dense SLAM Based on 3D Multi Level Pyramid Gaussian Splatting

HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction

NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments

SLAM and 3D Semantic Reconstruction Based on the Fusion of Lidar and Monocular Vision

MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization

A Visual SLAM System Based on the Panoramic Camera

360ORB-SLAM: A Visual SLAM System for Panoramic Images with Depth Completion Network

Panoptic 3D Scene Reconstruction From a Single RGB Image

A 3D Semantic Visual SLAM in Dynamic Scenes