SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM

Nikhil Keetha,Jay Karhade,Krishna Murthy Jatavallabhula,Gengshan Yang,Sebastian Scherer,Deva Ramanan,Jonathon Luiten
2024-04-16
Abstract:Dense simultaneous localization and mapping (SLAM) is crucial for robotics and augmented reality applications. However, current methods are often hampered by the non-volumetric or implicit way they represent a scene. This work introduces SplaTAM, an approach that, for the first time, leverages explicit volumetric representations, i.e., 3D Gaussians, to enable high-fidelity reconstruction from a single unposed RGB-D camera, surpassing the capabilities of existing methods. SplaTAM employs a simple online tracking and mapping system tailored to the underlying Gaussian representation. It utilizes a silhouette mask to elegantly capture the presence of scene density. This combination enables several benefits over prior representations, including fast rendering and dense optimization, quickly determining if areas have been previously mapped, and structured map expansion by adding more Gaussians. Extensive experiments show that SplaTAM achieves up to 2x superior performance in camera pose estimation, map construction, and novel-view synthesis over existing methods, paving the way for more immersive high-fidelity SLAM applications.
Computer Vision and Pattern Recognition,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The paper aims to address key issues in dense Simultaneous Localization and Mapping (SLAM), particularly focusing on the shortcomings of current methods in scene representation. Specifically, existing SLAM methods are often limited by non-volumetric or implicit scene representations, leading to performance bottlenecks when dealing with complex real-world environments. To solve these problems, the paper proposes a new method called SplaTAM, which for the first time utilizes explicit volumetric representation (i.e., 3D Gaussian distributions) to achieve high-fidelity reconstruction, surpassing the capabilities of existing methods. SplaTAM achieves high-precision camera tracking and high-fidelity map reconstruction by online optimizing explicit volumetric representations (3D Gaussian distributions) combined with differentiable rendering techniques. Compared to existing explicit and implicit representations, this method has the following advantages: 1. **Fast Rendering and Dense Optimization**: 3D Gaussian distributions can be rendered into images at speeds of up to 400 frames per second, significantly faster than implicit and volumetric alternatives. 2. **Maps with Clear Spatial Extents**: By rendering silhouette masks, it is easy to identify existing parts of the scene, efficiently recognizing new content in new views, and allowing for easy map updates. 3. **Direct Optimization of Scene Parameters**: Since the scene is represented by Gaussians with physical locations, colors, and sizes, nearly linear gradient flow can be achieved between parameters and dense photometric loss, enabling rapid optimization. Experimental results show that SplaTAM achieves significantly better performance than existing methods across multiple datasets, particularly in camera pose estimation, map construction, and novel view synthesis, paving the way for more immersive high-fidelity SLAM applications.