DLD-SLAM: RGB-D Visual Simultaneous Localisation and Mapping in Indoor Dynamic Environments Based on Deep Learning

Han Yu,Qing Wang,Chao Yan,Youyang Feng,Yang Sun,Lu Li
DOI: https://doi.org/10.3390/rs16020246
IF: 5
2024-01-09
Remote Sensing
Abstract:This work presents a novel RGB-D dynamic Simultaneous Localisation and Mapping (SLAM) method that improves the precision, stability, and efficiency of localisation while relying on lightweight deep learning in a dynamic environment compared to the traditional static feature-based visual SLAM algorithm. Based on ORB-SLAM3, the GCNv2-tiny network instead of the ORB method, improves the reliability of feature extraction and matching and the accuracy of position estimation; then, the semantic segmentation thread employs the lightweight YOLOv5s object detection algorithm based on the GSConv network combined with a depth image to determine potentially dynamic regions of the image. Finally, to guarantee that the static feature points are used for position estimation, dynamic probability is employed to determine the true dynamic feature points based on the optical flow, semantic labels, and the state in last frame. We have performed experiments on the TUM datasets to verify the feasibility of the algorithm. Compared with the classical dynamic visual SLAM algorithm, the experimental results demonstrate that the absolute trajectory error is greatly reduced in dynamic environments, and that the computing efficiency is improved by 31.54% compared with the real-time dynamic visual SLAM algorithm with close accuracy, demonstrating the superiority of DLD-SLAM in accuracy, stability, and efficiency.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
This paper attempts to solve the problem of visual simultaneous localization and mapping (VSLAM) in dynamic environments. Specifically, the paper proposes a new RGB - D visual SLAM method based on deep learning (DLD - SLAM), aiming to improve the positioning accuracy, stability and efficiency in dynamic environments. Traditional methods are mainly applicable to static environments. When encountering dynamic objects, especially when these objects have obvious textures or occupy a large proportion of the image, the accuracy and robustness of the system will decrease significantly. Therefore, this paper improves these problems by combining deep - learning techniques, especially lightweight deep - learning networks. The main contributions of DLD - SLAM include: 1. **Feature extraction and matching**: Based on the ORB - SLAM3 algorithm, the GCNv2 - tiny network is used to replace the traditional ORB method, which improves the efficiency and robustness of feature point extraction and matching. 2. **Lightweight object detection**: The YOLOv5s network model optimized by the GSConv module is applied to reduce the number of network parameters and improve the computational efficiency of the object detection algorithm, and the depth information of the RGB - D camera is combined to obtain the masks of potential dynamic targets. 3. **Dynamic feature point rejection strategy**: A concept of dynamic probability based on the LK optical flow method, semantic labels and the state of the previous frame is designed to identify and reject real dynamic feature points, so as to retain static feature points for position estimation, effectively solving the interference of dynamic objects on positioning. 4. **Experimental verification**: Through experiments on the TUM data set, it is proved that the absolute trajectory error of this algorithm in dynamic environments is significantly reduced, and the computational efficiency is improved by 31.54% compared with the real - time dynamic visual SLAM algorithm, showing its superiority while maintaining similar accuracy. In conclusion, by introducing deep - learning techniques, especially in feature extraction, object detection and dynamic feature point processing, this paper significantly improves the performance of VSLAM in dynamic environments.