A real-time, robust and versatile visual-SLAM framework based on deep learning networks

Zhang Xiao,Shuaixin Li
2024-06-04
Abstract:This paper explores how deep learning techniques can improve visual-based SLAM performance in challenging environments. By combining deep feature extraction and deep matching methods, we introduce a versatile hybrid visual SLAM system designed to enhance adaptability in challenging scenarios, such as low-light conditions, dynamic lighting, weak-texture areas, and severe jitter. Our system supports multiple modes, including monocular, stereo, monocular-inertial, and stereo-inertial configurations. We also perform analysis how to combine visual SLAM with deep learning methods to enlighten other researches. Through extensive experiments on both public datasets and self-sampled data, we demonstrate the superiority of the SL-SLAM system over traditional approaches. The experimental results show that SL-SLAM outperforms state-of-the-art SLAM algorithms in terms of localization accuracy and tracking robustness. For the benefit of community, we make public the source code at <a class="link-external link-https" href="https://github.com/zzzzxxxx111/SLslam" rel="external noopener nofollow">this https URL</a>.
Robotics
What problem does this paper attempt to address?
The paper primarily aims to address the performance improvement of Visual Simultaneous Localization and Mapping (vSLAM) systems under challenging environmental conditions. Specifically, the paper investigates the following points: 1. **Problems with existing vSLAM systems**: Traditional vSLAM systems face issues in complex scenarios such as dynamic lighting conditions, low-texture regions, and significant camera shake. This is because conventional feature extraction algorithms focus on local information in images while neglecting structural and semantic details, leading to decreased localization accuracy and unstable tracking. 2. **Application of deep learning technology**: In recent years, the development of deep learning technology has brought revolutionary changes to the field of computer vision. It can capture complex scene structures and semantic information through large-scale data training, making environmental perception more intelligent. However, there are two main approaches to applying deep learning to vSLAM: end-to-end methods often have high computational costs and weak real-time tracking capabilities; hybrid vSLAM can balance the advantages of geometric constraints and semantic understanding, but existing methods still have limitations in fully addressing the challenges of complex environments. 3. **Proposed new framework Rover-SLAM**: To overcome the above challenges, the authors propose a real-time, robust, and multifunctional visual SLAM framework named Rover-SLAM. This framework integrates state-of-the-art deep learning feature extraction module SuperPoint and feature matching method LightGlue, and uniformly adopts these deep learning methods throughout the SLAM system to enhance the performance of different tasks. Additionally, Rover-SLAM supports various sensor configurations, including monocular, stereo, monocular-inertial, and stereo-inertial setups, to meet the needs of different application scenarios. 4. **Experimental validation**: Extensive experiments on public datasets and self-collected data demonstrate significant improvements in trajectory estimation accuracy and tracking stability with Rover-SLAM. The experimental results show that the system performs excellently in various challenging scenarios, outperforming existing similar systems. In summary, this paper aims to enhance the robustness and accuracy of vSLAM systems in complex environments through the integrated application of deep learning technology, thereby advancing research progress and technological development in this field.