Abstract:In response to the challenge that traditional visual simultaneous localization and mapping (SLAM) systems, based on the assumption of a static environment, struggle to achieve real-time indoor 3D reconstruction in complex dynamic scenes, this paper proposes a real-time indoor 3D reconstruction algorithm based on semantic visual SLAM. By leveraging object detection to obtain 2D semantic information and providing prior information for geometric methods, the fusion of the two effectively suppresses dynamic features, reduces reliance on deep learning methods, and ensures the algorithm's real-time performance. Experimental results on dynamic scenes in the TUM RGB-D dataset show that our algorithm maintains nearly unchanged real-time performance while achieving an average performance improvement of approximately 97.56% and 97.31% on the TUM dataset and Bonn dataset, respectively, compared to the ORB-SLAM2 system. Moreover, our algorithm can reconstruct more intuitive indoor global Octo-map and semantic metric maps compared to sparse point cloud maps, effectively enhancing the scene perception capability of mobile robots and laying the foundation for performing advanced tasks. Furthermore, our algorithm demonstrates a 3.5-10.5 times improvement in real-time performance compared to other mainstream semantic SLAM systems. Experimental results on the NVIDIA Jetson AGX Xavier confirm that our algorithm can run in real time on low-power platforms such as mobile robots or drones. However, the drawbacks of our algorithm include lower reconstruction accuracy in low-texture and large-scale scenes and ineffective suppression of dynamic features in low-dynamic scenes. Future work will consider replacing and improving deep learning methods and integrating IMU and other sensors to enhance system usability.

Large-Scale 3D Semantic Mapping Using Monocular Vision

Multimodal sensor-based semantic 3D mapping for a large-scale environment

LODM: Large-scale Online Dense Mapping for UAV

Object-aware Semantic Mapping of Indoor Scenes Using Octomap

Building and optimization of 3D semantic map based on Lidar and camera fusion

Semi-Dense 3D Semantic Mapping from Monocular SLAM

LISNeRF Mapping: LiDAR-based Implicit Mapping via Semantic Neural Fields for Large-Scale 3D Scenes

Semantic 3D occupancy mapping through efficient high order CRFs

DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map

Towards a Meaningful 3D Map Using a 3D Lidar and a Camera

An Approach for Construct Semantic Map with Scene Classification and Object Semantic Segmentation

Real-time Dense 3D Semantic Mapping Using RGB-D Camera

3D Scene Reconstruction with Sparse LiDAR Data and Monocular Image in Single Frame

Semantic 3D Mapping from Deep Image Segmentation

Research on Indoor 3D Reconstruction Technology Based on Semantic Visual Simultaneous Localization and Mapping

Semantic 3D Reconstruction with Learning MVS and 2D Segmentation of Aerial Images

A semantic SLAM-based dense mapping approach for large-scale dynamic outdoor environment

Probabilistic Semantic Mapping for Autonomous Driving in Urban Environments

City-scale Continual Neural Semantic Mapping with Three-Layer Sampling and Panoptic Representation

Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments

SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation