Visual SLAMMOT Considering Multiple Motion Models

Peilin Tian,Hao Li
2024-11-28
Abstract:Simultaneous Localization and Mapping (SLAM) and Multi-Object Tracking (MOT) are pivotal tasks in the realm of autonomous driving, attracting considerable research attention. While SLAM endeavors to generate real-time maps and determine the vehicle's pose in unfamiliar settings, MOT focuses on the real-time identification and tracking of multiple dynamic objects. Despite their importance, the prevalent approach treats SLAM and MOT as independent modules within an autonomous vehicle system, leading to inherent limitations. Classical SLAM methodologies often rely on a static environment assumption, suitable for indoor rather than dynamic outdoor scenarios. Conversely, conventional MOT techniques typically rely on the vehicle's known state, constraining the accuracy of object state estimations based on this prior. To address these challenges, previous efforts introduced the unified SLAMMOT paradigm, yet primarily focused on simplistic motion patterns. In our team's previous work IMM-SLAMMOT\cite{IMM-SLAMMOT}, we present a novel methodology incorporating consideration of multiple motion models into SLAMMOT i.e. tightly coupled SLAM and MOT, demonstrating its efficacy in LiDAR-based systems. This paper studies feasibility and advantages of instantiating this methodology as visual SLAMMOT, bridging the gap between LiDAR and vision-based sensing mechanisms. Specifically, we propose a solution of visual SLAMMOT considering multiple motion models and validate the inherent advantages of IMM-SLAMMOT in the visual domain.
Robotics,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the limitations of existing SLAM (Simultaneous Localization and Mapping) and MOT (Multiple Object Tracking) when operating as independent modules in the field of autonomous driving. Specifically: 1. **Limitations of classical SLAM methods**: Most existing SLAM methods rely on the static environment assumption, which is suitable in indoor environments but performs poorly in dynamic outdoor scenarios. Moving vehicles and other dynamic objects may interfere with the vehicle's localization and introduce unnecessary motion information into the constructed environmental map. 2. **Limitations of traditional MOT methods**: Traditional MOT techniques usually rely on the known vehicle state to estimate the state of an object, which means that the estimation of the object state is limited by the accuracy of the vehicle state. In real - world driving scenarios, the two modules, SLAM and MOT, which need to be operated in real - time, are intrinsically related and interdependent. 3. **Deficiencies of existing SLAMMOT methods**: Although previous research has proposed the idea of unifying SLAM and MOT into one framework and proposed a more general SLAMMOT paradigm, these methods often only consider simple object motion patterns, such as a single constant - velocity model. In the actual environment, the motion state of an object is usually complex and switches between different patterns. Therefore, a simple motion model is not sufficient to effectively describe the state of an object. To address these challenges, the paper proposes a new visual SLAMMOT method that considers multiple motion models (IMM - SLAMMOT), aiming to bridge the gap between LiDAR (Light Detection and Ranging) and vision - based perception mechanisms. Specifically, the contributions of the paper include: - Providing a solution that demonstrates the feasibility of instantiating Methodology Level 3 (i.e., tightly coupled SLAM and MOT considering multiple motion models) as visual SLAMMOT. - Verifying the advantages of Methodology Level 3 in visual SLAMMOT and showing its superior performance compared to other methods. Through this method, the paper attempts to improve the performance of SLAM and MOT in dynamic outdoor environments, especially for autonomous driving applications.