MotionTrack: Learning Motion Predictor for Multiple Object Tracking

Changcheng Xiao,Qiong Cao,Yujie Zhong,Long Lan,Xiang Zhang,Zhigang Luo,Dacheng Tao
2024-03-11
Abstract:Significant progress has been achieved in multi-object tracking (MOT) through the evolution of detection and re-identification (ReID) techniques. Despite these advancements, accurately tracking objects in scenarios with homogeneous appearance and heterogeneous motion remains a challenge. This challenge arises from two main factors: the insufficient discriminability of ReID features and the predominant utilization of linear motion models in MOT. In this context, we introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor that relies solely on object trajectory information. This predictor comprehensively integrates two levels of granularity in motion features to enhance the modeling of temporal dynamics and facilitate precise future motion prediction for individual objects. Specifically, the proposed approach adopts a self-attention mechanism to capture token-level information and a Dynamic MLP layer to model channel-level features. MotionTrack is a simple, online tracking approach. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT, characterized by highly complex object motion.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address several key challenges in multi-object tracking (MOT), particularly the issue of tracking accuracy when dealing with objects that have similar appearances and complex motion patterns. Specifically, the paper focuses on the following two main problems: 1. **Insufficient Discriminative Power of ReID Features**: Existing multi-object tracking methods rely on re-identification (ReID) features to distinguish different objects when dealing with similar-looking objects. However, these features often fail to provide sufficient discriminative information in certain scenarios, such as dancers in a group dance or athletes in sports scenes. 2. **Limitations of Linear Motion Models**: Most existing multi-object tracking methods depend on linear motion models, such as the Kalman Filter, which perform poorly when handling nonlinear motion. For example, in cases of fast movement or sudden changes in direction, linear models often fail to accurately predict the future positions of objects. To address these challenges, the paper proposes a new motion prediction-based tracker called MotionTrack. The core of this tracker is a learnable motion predictor that relies solely on the trajectory information of objects. It enhances the modeling of temporal dynamics through a self-attention mechanism and a Dynamic Multi-Layer Perceptron (Dynamic MLP), thereby achieving precise future motion prediction. Experimental results show that MotionTrack achieves state-of-the-art performance on datasets with highly complex motion patterns, such as Dancetrack and SportsMOT.