Abstract:The integration of neural rendering and the SLAM system recently showed promising results in joint localization and photorealistic view reconstruction. However, existing methods, fully relying on implicit representations, are so resource-hungry that they cannot run on portable devices, which deviates from the original intention of SLAM. In this paper, we present Photo-SLAM, a novel SLAM framework with a hyper primitives map. Specifically, we simultaneously exploit explicit geometric features for localization and learn implicit photometric features to represent the texture information of the observed environment. In addition to actively densifying hyper primitives based on geometric features, we further introduce a Gaussian-Pyramid-based training method to progressively learn multi-level features, enhancing photorealistic mapping performance. The extensive experiments with monocular, stereo, and RGB-D datasets prove that our proposed system Photo-SLAM significantly outperforms current state-of-the-art SLAM systems for online photorealistic mapping, e.g., PSNR is 30% higher and rendering speed is hundreds of times faster in the Replica dataset. Moreover, the Photo-SLAM can run at real-time speed using an embedded platform such as Jetson AGX Orin, showing the potential of robotics applications.

What problem does this paper attempt to address?

### Problems the paper attempts to solve The paper "Photo - SLAM: Real - time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB - D Cameras" aims to solve the following problems: 1. **Real - time localization and high - fidelity mapping**: - Although existing SLAM systems can achieve geometric localization and mapping, they usually cannot achieve real - time high - fidelity view reconstruction on resource - constrained devices. These systems often rely on implicit representations and require high computational resources, making it difficult to run on portable devices. - This paper proposes a new SLAM framework - Photo - SLAM, which can achieve high - fidelity view reconstruction while maintaining real - time performance. 2. **Multi - sensor support**: - Existing SLAM systems usually can only support specific types of sensors (such as monocular, binocular or RGB - D cameras), and cannot support multiple sensors simultaneously. - Photo - SLAM is designed to support monocular, binocular and RGB - D cameras, and is suitable for indoor and outdoor environments. 3. **Efficient learning method**: - Existing SLAM systems require a large amount of computational resources during the optimization process, especially when generating high - fidelity views. - This paper introduces a Gaussian - Pyramid - based Learning method, which can effectively learn multi - scale features and improve mapping performance. 4. **Combination of geometric features and texture information**: - Existing SLAM systems either focus on the extraction and optimization of geometric features or rely on implicit representations to capture texture information, but it is often difficult to balance both. - Photo - SLAM achieves accurate localization and high - quality mapping through the combination of explicit geometric features and implicit photometric features. ### Main contributions 1. **Innovative framework**: - Proposed a real - time localization and high - fidelity mapping system based on the Hyper Primitives Map, which supports multiple sensor types. 2. **Efficient progressive learning method**: - Introduced a Gaussian - Pyramid - based Learning method, which can effectively learn multi - scale features and improve mapping quality. 3. **High - performance implementation**: - The system is fully implemented in C++ and CUDA, achieving state - of - the - art performance and can be run in real - time on embedded platforms. ### Experimental results - **Quantitative analysis**: - The experimental results on the Replica and TUM RGB - D datasets show that Photo - SLAM is superior to existing SLAM systems in terms of localization accuracy and mapping quality. - Especially in the monocular scenario, Photo - SLAM is significantly better than other methods and can achieve real - time rendering on resource - constrained devices. - **Qualitative analysis**: - Photo - SLAM can generate high - fidelity views, avoiding the over - smoothing and obvious artifacts common in other methods. ### Conclusion By proposing the Photo - SLAM framework, this paper solves the shortcomings of existing SLAM systems in real - time high - fidelity view reconstruction, and realizes multi - sensor support, an efficient progressive learning method and high - performance implementation. These improvements make Photo - SLAM have great potential in practical robot applications.

Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras

PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments

GS3LAM: Gaussian Semantic Splatting SLAM

Gaussian-LIC: Real-Time Photo-Realistic SLAM with Gaussian Splatting and LiDAR-Inertial-Camera Fusion

Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

HI-SLAM: Monocular Real-time Dense Mapping with Hybrid Implicit Fields

Orbeez-SLAM: A Real-time Monocular Visual SLAM with ORB Features and NeRF-realized Mapping

Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM

SG-SLAM: A Real-Time RGB-D Visual SLAM Toward Dynamic Scenes With Semantic and Geometric Information

RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting

EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively Constrained Global Bundle Adjustment.

DM-SLAM: A Feature-Based SLAM System for Rigid Dynamic Scenes

MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting

HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction

NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding

SCE-SLAM: a real-time semantic RGBD SLAM system in dynamic scenes based on spatial coordinate error

SAR-SLAM: Self-Attentive Rendering-based SLAM with Neural Point Cloud Encoding

MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements

DXSLAM: A Robust and Efficient Visual SLAM System with Deep Features.

CP-SLAM: Collaborative Neural Point-based SLAM System