Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting

Boying Li,Zhixi Cai,Yuan-Fang Li,Ian Reid,Hamid Rezatofighi
2024-10-09
Abstract:We propose Hi-SLAM, a semantic 3D Gaussian Splatting SLAM method featuring a novel hierarchical categorical representation, which enables accurate global 3D semantic mapping, scaling-up capability, and explicit semantic label prediction in the 3D world. The parameter usage in semantic SLAM systems increases significantly with the growing complexity of the environment, making it particularly challenging and costly for scene understanding. To address this problem, we introduce a novel hierarchical representation that encodes semantic information in a compact form into 3D Gaussian Splatting, leveraging the capabilities of large language models (LLMs). We further introduce a novel semantic loss designed to optimize hierarchical semantic information through both inter-level and cross-level optimization. Furthermore, we enhance the whole SLAM system, resulting in improved tracking and mapping performance. Our Hi-SLAM outperforms existing dense SLAM methods in both mapping and tracking accuracy, while achieving a 2x operation speed-up. Additionally, it exhibits competitive performance in rendering semantic segmentation in small synthetic scenes, with significantly reduced storage and training time requirements. Rendering FPS impressively reaches 2,000 with semantic information and 3,000 without it. Most notably, it showcases the capability of handling the complex real-world scene with more than 500 semantic classes, highlighting its valuable scaling-up capability.
Robotics,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: in the Visual SLAM (Visual Simultaneous Localization and Mapping) system, how to efficiently expand the scale of semantic information to achieve accurate global 3D semantic mapping and improve the system's scalability and the ability of explicit semantic label prediction. Specifically, the author proposes a new method named Hi - SLAM, which solves the following problems by introducing Hierarchically Categorical Gaussian Splatting: 1. **Large amount of parameter usage**: As the environmental complexity increases, the amount of parameter usage in the semantic SLAM system increases significantly, making scene understanding particularly difficult and costly. 2. **High storage and training time requirements**: Directly adding discrete semantic labels to each 3D point will significantly increase the storage requirements and processing time, especially when the number of semantic categories is large. 3. **Lack of hierarchical semantic representation**: Existing methods usually adopt a flattened semantic representation, ignoring the natural hierarchical structure of semantic information, which limits the understanding ability of complex scenes. To solve these problems, Hi - SLAM introduces the following innovations: - **Hierarchical classification representation**: Encode semantic information through a hierarchical tree structure, compress semantic data into a more compact form, thereby reducing memory usage and training time. - **New semantic loss function**: Design a loss function that combines intra - level optimization and cross - level optimization to ensure comprehensive optimization of hierarchical semantic information. - **Enhanced SLAM system**: Improve the tracking and mapping performance of the entire SLAM system, achieving higher accuracy and faster speed. These improvements enable Hi - SLAM to handle more than 500 semantic categories in complex real - world scenarios, demonstrating its strong scalability.