Abstract:Simultaneous Localization and Mapping (SLAM) and Multi-Object Tracking (MOT) are pivotal tasks in the realm of autonomous driving, attracting considerable research attention. While SLAM endeavors to generate real-time maps and determine the vehicle's pose in unfamiliar settings, MOT focuses on the real-time identification and tracking of multiple dynamic objects. Despite their importance, the prevalent approach treats SLAM and MOT as independent modules within an autonomous vehicle system, leading to inherent limitations. Classical SLAM methodologies often rely on a static environment assumption, suitable for indoor rather than dynamic outdoor scenarios. Conversely, conventional MOT techniques typically rely on the vehicle's known state, constraining the accuracy of object state estimations based on this prior. To address these challenges, previous efforts introduced the unified SLAMMOT paradigm, yet primarily focused on simplistic motion patterns. In our team's previous work IMM-SLAMMOT\cite{IMM-SLAMMOT}, we present a novel methodology incorporating consideration of multiple motion models into SLAMMOT i.e. tightly coupled SLAM and MOT, demonstrating its efficacy in LiDAR-based systems. This paper studies feasibility and advantages of instantiating this methodology as visual SLAMMOT, bridging the gap between LiDAR and vision-based sensing mechanisms. Specifically, we propose a solution of visual SLAMMOT considering multiple motion models and validate the inherent advantages of IMM-SLAMMOT in the visual domain.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the limitations of existing SLAM (Simultaneous Localization and Mapping) and MOT (Multiple Object Tracking) when operating as independent modules in the field of autonomous driving. Specifically: 1. **Limitations of classical SLAM methods**: Most existing SLAM methods rely on the static environment assumption, which is suitable in indoor environments but performs poorly in dynamic outdoor scenarios. Moving vehicles and other dynamic objects may interfere with the vehicle's localization and introduce unnecessary motion information into the constructed environmental map. 2. **Limitations of traditional MOT methods**: Traditional MOT techniques usually rely on the known vehicle state to estimate the state of an object, which means that the estimation of the object state is limited by the accuracy of the vehicle state. In real - world driving scenarios, the two modules, SLAM and MOT, which need to be operated in real - time, are intrinsically related and interdependent. 3. **Deficiencies of existing SLAMMOT methods**: Although previous research has proposed the idea of unifying SLAM and MOT into one framework and proposed a more general SLAMMOT paradigm, these methods often only consider simple object motion patterns, such as a single constant - velocity model. In the actual environment, the motion state of an object is usually complex and switches between different patterns. Therefore, a simple motion model is not sufficient to effectively describe the state of an object. To address these challenges, the paper proposes a new visual SLAMMOT method that considers multiple motion models (IMM - SLAMMOT), aiming to bridge the gap between LiDAR (Light Detection and Ranging) and vision - based perception mechanisms. Specifically, the contributions of the paper include: - Providing a solution that demonstrates the feasibility of instantiating Methodology Level 3 (i.e., tightly coupled SLAM and MOT considering multiple motion models) as visual SLAMMOT. - Verifying the advantages of Methodology Level 3 in visual SLAMMOT and showing its superior performance compared to other methods. Through this method, the paper attempts to improve the performance of SLAM and MOT in dynamic outdoor environments, especially for autonomous driving applications.

Visual SLAMMOT Considering Multiple Motion Models

LiDAR SLAMMOT based on Confidence-guided Data Association

LIMOT: A Tightly-Coupled System for LiDAR-Inertial Odometry and Multi-Object Tracking

DMOT-SLAM: Visual SLAM in Dynamic Environments with Moving Object Tracking

MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth estimation

MCOV-SLAM: A Multicamera Omnidirectional Visual SLAM System

GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System

CAMO-MOT: Combined Appearance-Motion Optimization for 3D Multi-Object Tracking With Camera-LiDAR Fusion

Multi-Classes and Motion Properties for Concurrent Visual SLAM in Dynamic Environments

Communication-Efficient Cooperative SLAMMOT via Determining the Number of Collaboration Vehicles

Robust Multi-Modal Multi-LiDAR-Inertial Odometry and Mapping for Indoor Environments

Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and Navigation

MN-SLAM: Multi-networks Visual SLAM for Dynamic and Complicated Environments

HVL-SLAM: Hybrid Vision and LiDAR Fusion for SLAM

MISD-SLAM: Multimodal Semantic SLAM for Dynamic Environments

Monocular Visual-Inertial Navigation for Dynamic Environment

Visual-LiDAR SLAM Based on Unsupervised Multi-channel Deep Neural Networks

DVI-SLAM: A Dual Visual Inertial SLAM Network

DM-SLAM: A Feature-Based SLAM System for Rigid Dynamic Scenes

Hybrid Motion Model for Multiple Object Tracking in Mobile Devices