PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments

Haoang Li,Xiangqi Meng,Xingxing Zuo,Zhe Liu,Hesheng Wang,Daniel Cremers
2024-11-24
Abstract:Simultaneous localization and mapping (SLAM) has achieved impressive performance in static environments. However, SLAM in dynamic environments remains an open question. Many methods directly filter out dynamic objects, resulting in incomplete scene reconstruction and limited accuracy of camera localization. The other works express dynamic objects by point clouds, sparse joints, or coarse meshes, which fails to provide a photo-realistic representation. To overcome the above limitations, we propose a photo-realistic and geometry-aware RGB-D SLAM method by extending Gaussian splatting. Our method is composed of three main modules to 1) map the dynamic foreground including non-rigid humans and rigid items, 2) reconstruct the static background, and 3) localize the camera. To map the foreground, we focus on modeling the deformations and/or motions. We consider the shape priors of humans and exploit geometric and appearance constraints of humans and items. For background mapping, we design an optimization strategy between neighboring local maps by integrating appearance constraint into geometric alignment. As to camera localization, we leverage both static background and dynamic foreground to increase the observations for noise compensation. We explore the geometric and appearance constraints by associating 3D Gaussians with 2D optical flows and pixel patches. Experiments on various real-world datasets demonstrate that our method outperforms state-of-the-art approaches in terms of camera localization and scene representation. Source codes will be publicly available upon paper acceptance.
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problem of Simultaneous Localization and Mapping (SLAM) in dynamic environments. Specifically, traditional SLAM methods perform well in static environments but face many challenges in dynamic environments. The main problems include: 1. **Incomplete Scene Reconstruction**: Many existing methods directly filter out dynamic objects, resulting in the inability to reconstruct these objects and thus incomplete scene reconstruction. 2. **Limited Camera Localization Accuracy**: Due to the lack of information about dynamic objects, the camera's localization accuracy is affected, especially when dynamic objects dominate in the image. 3. **Lack of Realistic Representation**: Existing SLAM methods usually use point clouds, sparse joints or rough meshes when representing dynamic objects and cannot provide realistic visual effects. To solve these problems, this paper proposes an RGB - D SLAM method based on Gaussian Splatting, named PG - SLAM. This method aims to achieve the following goals: - **Reconstruct Dynamic Foreground**: Including non - rigid human bodies and rigid items, considering geometric priors and appearance constraints. - **Reconstruct Static Background**: Optimize the Gaussian distribution in the local map through multi - view appearance constraints to ensure accurate reconstruction of the background. - **Camera Localization**: Utilize the information of the static background and dynamic foreground, combine geometric and appearance constraints, and improve the accuracy of camera localization. ### Main Contributions 1. **Propose a SLAM method based on Gaussian Splatting for the first time**: It can not only localize the camera and reconstruct the static background, but also map dynamic human bodies and items. 2. **Provide a realistic representation of dynamic scenes**: For foreground mapping, the human body shape prior is considered, and geometric and appearance constraints are utilized; for background mapping, an effective optimization strategy is designed. 3. **Combine geometric and appearance constraints for camera localization**: By correlating 3D Gaussian distributions with 2D optical flow and pixel blocks, use the static background and dynamic foreground information to compensate for noise, significantly improving the localization accuracy. Experimental results show that this method outperforms the existing state - of - the - art methods on multiple real - world datasets, especially in camera localization and scene representation.