Abstract:Conventional geometry-based SLAM systems lack dense 3D reconstruction capabilities since their data association usually relies on feature correspondences. Additionally, learning-based SLAM systems often fall short in terms of real-time performance and accuracy. Balancing real-time performance with dense 3D reconstruction capabilities is a challenging problem. In this paper, we propose a real-time RGB-D SLAM system that incorporates a novel view synthesis technique, 3D Gaussian Splatting, for 3D scene representation and pose estimation. This technique leverages the real-time rendering performance of 3D Gaussian Splatting with rasterization and allows for differentiable optimization in real time through CUDA implementation. We also enable mesh reconstruction from 3D Gaussians for explicit dense 3D reconstruction. To estimate accurate camera poses, we utilize a rotation-translation decoupled strategy with inverse optimization. This involves iteratively updating both in several iterations through gradient-based optimization. This process includes differentiably rendering RGB, depth, and silhouette maps and updating the camera parameters to minimize a combined loss of photometric loss, depth geometry loss, and visibility loss, given the existing 3D Gaussian map. However, 3D Gaussian Splatting (3DGS) struggles to accurately represent surfaces due to the multi-view inconsistency of 3D Gaussians, which can lead to reduced accuracy in both camera pose estimation and scene reconstruction. To address this, we utilize depth priors as additional regularization to enforce geometric constraints, thereby improving the accuracy of both pose estimation and 3D reconstruction. We also provide extensive experimental results on public benchmark datasets to demonstrate the effectiveness of our proposed methods in terms of pose accuracy, geometric accuracy, and rendering performance.

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views

PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections

Generalizable Human Gaussians for Sparse View Synthesis

FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes

GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting

Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization

SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting

SRGS: Super-Resolution 3D Gaussian Splatting

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting

LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors

GS-Pose: Generalizable Segmentation-based 6D Object Pose Estimation with 3D Gaussian Splatting

Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis

A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis

HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting

GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features