Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras

Huajian Huang,Longwei Li,Hui Cheng,Sai-Kit Yeung
2024-04-08
Abstract:The integration of neural rendering and the SLAM system recently showed promising results in joint localization and photorealistic view reconstruction. However, existing methods, fully relying on implicit representations, are so resource-hungry that they cannot run on portable devices, which deviates from the original intention of SLAM. In this paper, we present Photo-SLAM, a novel SLAM framework with a hyper primitives map. Specifically, we simultaneously exploit explicit geometric features for localization and learn implicit photometric features to represent the texture information of the observed environment. In addition to actively densifying hyper primitives based on geometric features, we further introduce a Gaussian-Pyramid-based training method to progressively learn multi-level features, enhancing photorealistic mapping performance. The extensive experiments with monocular, stereo, and RGB-D datasets prove that our proposed system Photo-SLAM significantly outperforms current state-of-the-art SLAM systems for online photorealistic mapping, e.g., PSNR is 30% higher and rendering speed is hundreds of times faster in the Replica dataset. Moreover, the Photo-SLAM can run at real-time speed using an embedded platform such as Jetson AGX Orin, showing the potential of robotics applications.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper "Photo - SLAM: Real - time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB - D Cameras" aims to solve the following problems: 1. **Real - time localization and high - fidelity mapping**: - Although existing SLAM systems can achieve geometric localization and mapping, they usually cannot achieve real - time high - fidelity view reconstruction on resource - constrained devices. These systems often rely on implicit representations and require high computational resources, making it difficult to run on portable devices. - This paper proposes a new SLAM framework - Photo - SLAM, which can achieve high - fidelity view reconstruction while maintaining real - time performance. 2. **Multi - sensor support**: - Existing SLAM systems usually can only support specific types of sensors (such as monocular, binocular or RGB - D cameras), and cannot support multiple sensors simultaneously. - Photo - SLAM is designed to support monocular, binocular and RGB - D cameras, and is suitable for indoor and outdoor environments. 3. **Efficient learning method**: - Existing SLAM systems require a large amount of computational resources during the optimization process, especially when generating high - fidelity views. - This paper introduces a Gaussian - Pyramid - based Learning method, which can effectively learn multi - scale features and improve mapping performance. 4. **Combination of geometric features and texture information**: - Existing SLAM systems either focus on the extraction and optimization of geometric features or rely on implicit representations to capture texture information, but it is often difficult to balance both. - Photo - SLAM achieves accurate localization and high - quality mapping through the combination of explicit geometric features and implicit photometric features. ### Main contributions 1. **Innovative framework**: - Proposed a real - time localization and high - fidelity mapping system based on the Hyper Primitives Map, which supports multiple sensor types. 2. **Efficient progressive learning method**: - Introduced a Gaussian - Pyramid - based Learning method, which can effectively learn multi - scale features and improve mapping quality. 3. **High - performance implementation**: - The system is fully implemented in C++ and CUDA, achieving state - of - the - art performance and can be run in real - time on embedded platforms. ### Experimental results - **Quantitative analysis**: - The experimental results on the Replica and TUM RGB - D datasets show that Photo - SLAM is superior to existing SLAM systems in terms of localization accuracy and mapping quality. - Especially in the monocular scenario, Photo - SLAM is significantly better than other methods and can achieve real - time rendering on resource - constrained devices. - **Qualitative analysis**: - Photo - SLAM can generate high - fidelity views, avoiding the over - smoothing and obvious artifacts common in other methods. ### Conclusion By proposing the Photo - SLAM framework, this paper solves the shortcomings of existing SLAM systems in real - time high - fidelity view reconstruction, and realizes multi - sensor support, an efficient progressive learning method and high - performance implementation. These improvements make Photo - SLAM have great potential in practical robot applications.