MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model

Changcheng Xiao,Qiong Cao,Zhigang Luo,Long Lan
2024-08-17
Abstract:Tracking by detection has been the prevailing paradigm in the field of Multi-object Tracking (MOT). These methods typically rely on the Kalman Filter to estimate the future locations of objects, assuming linear object motion. However, they fall short when tracking objects exhibiting nonlinear and diverse motion in scenarios like dancing and sports. In addition, there has been limited focus on utilizing learning-based motion predictors in MOT. To address these challenges, we resort to exploring data-driven motion prediction methods. Inspired by the great expectation of state space models (SSMs), such as Mamba, in long-term sequence modeling with near-linear complexity, we introduce a Mamba-based motion model named Mamba moTion Predictor (MTP). MTP is designed to model the complex motion patterns of objects like dancers and athletes. Specifically, MTP takes the spatial-temporal location dynamics of objects as input, captures the motion pattern using a bi-Mamba encoding layer, and predicts the next motion. In real-world scenarios, objects may be missed due to occlusion or motion blur, leading to premature termination of their trajectories. To tackle this challenge, we further expand the application of MTP. We employ it in an autoregressive way to compensate for missing observations by utilizing its own predictions as inputs, thereby contributing to more consistent trajectories. Our proposed tracker, MambaTrack, demonstrates advanced performance on benchmarks such as Dancetrack and SportsMOT, which are characterized by complex motion and severe occlusion.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges of multi - object tracking (MOT) in complex scenarios, especially when the objects exhibit nonlinear and diverse motion patterns, such as the objects in dance and sports scenes. Existing detection - based multi - object tracking methods usually rely on the Kalman filter to predict the future positions of the objects, assuming that the object motion is linear. However, these methods perform poorly when dealing with nonlinear motion and frequently occluded scenarios. In addition, currently, less attention has been paid to the application of learning - based motion predictors in multi - object tracking. To address these challenges, the paper proposes a data - driven motion prediction method based on the state - space model (SSM), namely Mamba moTion Predictor (MTP). MTP aims to capture complex motion patterns such as those of dancers and athletes. Specifically, MTP takes the spatio - temporal position dynamics of the object as input, uses double Mamba encoding layers to capture the motion pattern, and predicts the next motion. Moreover, in order to deal with the problem of object loss caused by occlusion or motion blur, the paper further extends the application of MTP. By using its own prediction results as input in an autoregressive manner, it compensates for the missing observations, thereby generating more consistent trajectories. The main contributions of the paper include: 1. Introducing a data - driven motion predictor, Mamba moTion Predictor (MTP), for modeling diverse motion patterns in complex scenarios. 2. Proposing a trajectory repair module that re - establishes lost trajectories using MTP in an autoregressive manner. 3. The proposed online tracker, MambaTrack, performs excellently when dealing with data association problems in complex dance and sports scenes, especially achieving state - of - the - art performance in two benchmark tests, DanceTrack and SportsMOT. Through these innovations, MambaTrack can effectively handle scenarios with complex motion and severe occlusion, providing a new solution to improve the performance of multi - object tracking.