Multi-object Tracking by Detection and Query: an efficient end-to-end manner

Shukun Jia,Yichao Cao,Feng Yang,Xin Lu,Xiaobo Lu
2024-11-09
Abstract:Multi-object tracking is advancing through two dominant paradigms: traditional tracking by detection and newly emerging tracking by query. In this work, we fuse them together and propose the tracking-by-detection-and-query paradigm, which is achieved by a Learnable Associator. Specifically, the basic information interaction module and the content-position alignment module are proposed for thorough information Interaction among object queries. Tracking results are directly Decoded from these queries. Hence, we name the method as LAID. Compared to tracking-by-query models, LAID achieves competitive tracking accuracy with notably higher training efficiency. With regard to tracking-by-detection methods, experimental results on DanceTrack show that LAID significantly surpasses the state-of-the-art heuristic method by 3.9% on HOTA metric and 6.1% on IDF1 metric. On SportsMOT, LAID also achieves the best score on HOTA metric. By holding low training cost, strong tracking capabilities, and an elegant end-to-end approach all at once, LAID presents a forward-looking direction for the field.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of the two main paradigms in the field of multi - object tracking (MOT) - the traditional tracking - by - detection and the emerging tracking - by - query. Specifically: 1. **Traditional tracking - by - detection paradigm**: - **Modularity problem**: This paradigm treats detection and association tasks separately. Although the structure is clear, it lacks integrity, resulting in poor performance in complex scenarios. - **Feature extraction problem**: The combination of appearance information and motion patterns is not tight enough. An additional ReID model needs to be trained, and these models may not achieve optimal performance in the MOT setting. - **Limitations of heuristic methods**: It relies on manually - designed hyper - parameters and complex multi - step settings, and it is difficult to cope with various motion patterns and severe occlusions in complex scenarios. 2. **Emerging tracking - by - query paradigm**: - **Coupling problem**: This paradigm performs detection and association tasks simultaneously. Although the association ability is significantly improved, due to the coupling of the two tasks, the training efficiency is low. - **Inefficient training process**: Due to the differences between the detection task and the association task, for example, the detection task focuses on a single - frame image while the association task requires continuous frames to learn temporal cues, making the training process inefficient. To solve these problems, the paper proposes a new paradigm - **tracking - by - detection - and - query**, and introduces a learnable associator. This associator can fuse the advantages of the two paradigms, maintaining a clear structure of detection and association tasks, and achieving strong association ability and efficient end - to - end training. ### Specific solutions - **Basic Information Interaction (BII) module**: Used to promote information exchange in the content part between detection queries and tracking queries. - **Content - Position Alignment (CPA) module**: Used to update the position part of object queries and align it with the content part. - **Association decoder**: Directly decodes the prediction results through the object queries after interaction, achieving a fully end - to - end method. ### Main contributions 1. Proposed LAID (Learnable Associator for Detection and Query), realizing a new paradigm of tracking - by - detection - and - query. 2. Designed the BII module and the CPA module to ensure the effectiveness of LAID. 3. Verified the superior performance of LAID on multiple large - scale datasets, especially achieving an impressive balance between training efficiency and tracking accuracy. Through these innovations, LAID not only improves the performance of multi - object tracking but also provides a more efficient and elegant solution.