Abstract:This paper explores how deep learning techniques can improve visual-based SLAM performance in challenging environments. By combining deep feature extraction and deep matching methods, we introduce a versatile hybrid visual SLAM system designed to enhance adaptability in challenging scenarios, such as low-light conditions, dynamic lighting, weak-texture areas, and severe jitter. Our system supports multiple modes, including monocular, stereo, monocular-inertial, and stereo-inertial configurations. We also perform analysis how to combine visual SLAM with deep learning methods to enlighten other researches. Through extensive experiments on both public datasets and self-sampled data, we demonstrate the superiority of the SL-SLAM system over traditional approaches. The experimental results show that SL-SLAM outperforms state-of-the-art SLAM algorithms in terms of localization accuracy and tracking robustness. For the benefit of community, we make public the source code at <a class="link-external link-https" href="https://github.com/zzzzxxxx111/SLslam" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper primarily aims to address the performance improvement of Visual Simultaneous Localization and Mapping (vSLAM) systems under challenging environmental conditions. Specifically, the paper investigates the following points: 1. **Problems with existing vSLAM systems**: Traditional vSLAM systems face issues in complex scenarios such as dynamic lighting conditions, low-texture regions, and significant camera shake. This is because conventional feature extraction algorithms focus on local information in images while neglecting structural and semantic details, leading to decreased localization accuracy and unstable tracking. 2. **Application of deep learning technology**: In recent years, the development of deep learning technology has brought revolutionary changes to the field of computer vision. It can capture complex scene structures and semantic information through large-scale data training, making environmental perception more intelligent. However, there are two main approaches to applying deep learning to vSLAM: end-to-end methods often have high computational costs and weak real-time tracking capabilities; hybrid vSLAM can balance the advantages of geometric constraints and semantic understanding, but existing methods still have limitations in fully addressing the challenges of complex environments. 3. **Proposed new framework Rover-SLAM**: To overcome the above challenges, the authors propose a real-time, robust, and multifunctional visual SLAM framework named Rover-SLAM. This framework integrates state-of-the-art deep learning feature extraction module SuperPoint and feature matching method LightGlue, and uniformly adopts these deep learning methods throughout the SLAM system to enhance the performance of different tasks. Additionally, Rover-SLAM supports various sensor configurations, including monocular, stereo, monocular-inertial, and stereo-inertial setups, to meet the needs of different application scenarios. 4. **Experimental validation**: Extensive experiments on public datasets and self-collected data demonstrate significant improvements in trajectory estimation accuracy and tracking stability with Rover-SLAM. The experimental results show that the system performs excellently in various challenging scenarios, outperforming existing similar systems. In summary, this paper aims to enhance the robustness and accuracy of vSLAM systems in complex environments through the integrated application of deep learning technology, thereby advancing research progress and technological development in this field.

A real-time, robust and versatile visual-SLAM framework based on deep learning networks

A robust stereo feature-aided semi-direct SLAM system

DXSLAM: A Robust and Efficient Visual SLAM System with Deep Features.

Light-SLAM: A Robust Deep-Learning Visual SLAM System Based on LightGlue under Challenging Lighting Conditions

DF-SLAM: A Deep-Learning Enhanced Visual SLAM System based on Deep Local Features

A Robust Deep Learning Enhanced Monocular SLAM System for Dynamic Environments

Real-Time Dynamic SLAM Algorithm Based on Deep Learning

A Monocular Visual SLAM System Augmented by Lightweight Deep Local Feature Extractor Using In-House and Low-Cost LIDAR-camera Integrated Device

Dynamic-SLAM: Semantic monocular visual localization and mapping based on deep learning in dynamic environment

BASL-AD SLAM: A Robust Deep-Learning Feature-Based Visual SLAM System With Adaptive Motion Model

A Survey of Deep Learning Application in Dynamic Visual SLAM

A deep-learning real-time visual SLAM system based on multi-task feature extraction network and self-supervised feature points

A Real-Time Dynamic SLAM Algorithm Based on the Fusion of Visual, Inertial, and Semantic Information

ADM-SLAM: Accurate and Fast Dynamic Visual SLAM with Adaptive Feature Point Extraction, Deeplabv3pro, and Multi-View Geometry

Semantic visual simultaneous localization and mapping (SLAM) using deep learning for dynamic scenes

Real-Time Visual-Inertial Localization Using Semantic Segmentation Towards Dynamic Environments

DRV-SLAM: An Adaptive Real-Time Semantic Visual SLAM Based on Instance Segmentation Toward Dynamic Environments

SuperVINS: A Real-Time Visual-Inertial SLAM Framework for Challenging Imaging Conditions