S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

Thanh Nguyen Canh,Van-Truong Nguyen,Xiem HoangVan,Armagan Elibol,Nak Young Chong
2024-01-16
Abstract:Unmanned Aerial Vehicles (UAVs) hold immense potential for critical applications, such as search and rescue operations, where accurate perception of indoor environments is paramount. However, the concurrent amalgamation of localization, 3D reconstruction, and semantic segmentation presents a notable hurdle, especially in the context of UAVs equipped with constrained power and computational resources. This paper presents a novel approach to address challenges in semantic information extraction and utilization within UAV operations. Our system integrates state-of-the-art visual SLAM to estimate a comprehensive 6-DoF pose and advanced object segmentation methods at the back end. To improve the computational and storage efficiency of the framework, we adopt a streamlined voxel-based 3D map representation - OctoMap to build a working system. Furthermore, the fusion algorithm is incorporated to obtain the semantic information of each frame from the front-end SLAM task, and the corresponding point. By leveraging semantic information, our framework enhances the UAV's ability to perceive and navigate through indoor spaces, addressing challenges in pose estimation accuracy and uncertainty reduction. Through Gazebo simulations, we validate the efficacy of our proposed system and successfully embed our approach into a Jetson Xavier AGX unit for real-world applications.
Robotics
What problem does this paper attempt to address?
The paper attempts to address the problem of how to efficiently achieve semantic information extraction and utilization in UAV (UAV) operations, particularly the challenges of accurate localization, 3D reconstruction, and semantic segmentation in indoor environments. Specifically, the paper proposes a new method—Semantic Sparse Mapping (S3M), aimed at solving the following issues: 1. **Accuracy of Localization and 3D Reconstruction**: Achieving high-precision localization and 3D reconstruction simultaneously on resource-constrained UAVs is a challenge. The paper improves the accuracy of localization and 3D reconstruction by combining state-of-the-art visual SLAM technology with advanced object segmentation methods. 2. **Extraction and Integration of Semantic Information**: Traditional SLAM technology can only generate geometric maps and cannot provide rich semantic information. The proposed method enhances the UAV's understanding of the environment by integrating semantic information into the front-end SLAM tasks, enabling it to perform complex tasks better, such as maneuvering around a table to find a victim by the bedside. 3. **Computational and Storage Efficiency**: To achieve real-time processing on resource-limited platforms, the paper adopts an octree-based 3D map representation method (OctoMap), significantly reducing storage requirements and optimizing visual representation. 4. **Real-time Application**: The effectiveness of the system was validated through Gazebo simulation, and the method was successfully embedded into the Jetson Xavier AGX unit, achieving real-time semantic mapping in practical applications. In summary, the main goal of this paper is to improve the perception and navigation capabilities of UAVs in complex indoor environments through an efficient semantic sparse mapping method, addressing the limitations of traditional SLAM technology.