RTAT: A Robust Two-stage Association Tracker for Multi-Object Tracking

Song Guo,Rujie Liu,Narishige Abe
2024-08-14
Abstract:Data association is an essential part in the tracking-by-detection based Multi-Object Tracking (MOT). Most trackers focus on how to design a better data association strategy to improve the tracking performance. The rule-based handcrafted association methods are simple and highly efficient but lack generalization capability to deal with complex scenes. While the learnt association methods can learn high-order contextual information to deal with various complex scenes, but they have the limitations of higher complexity and cost. To address these limitations, we propose a Robust Two-stage Association Tracker, named RTAT. The first-stage association is performed between tracklets and detections to generate tracklets with high purity, and the second-stage association is performed between tracklets to form complete trajectories. For the first-stage association, we use a simple data association strategy to generate tracklets with high purity by setting a low threshold for the matching cost in the assignment process. We conduct the tracklet association in the second-stage based on the framework of message-passing GNN. Our method models the tracklet association as a series of edge classification problem in hierarchical graphs, which can recursively merge short tracklets into longer ones. Our tracker RTAT ranks first on the test set of MOT17 and MOT20 benchmarks in most of the main MOT metrics: HOTA, IDF1, and AssA. We achieve 67.2 HOTA, 84.7 IDF1, and 69.7 AssA on MOT17, and 66.2 HOTA, 82.5 IDF1, and 68.1 AssA on MOT20.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve the data association problem in multi - object tracking (MOT). Specifically, the author points out two main limitations of existing methods: 1. **Rule - based manually - designed association methods**: These methods are simple and efficient, but lack generalization ability when dealing with complex scenarios. For example, in crowded scenes, with fast camera movement, at night or under low - resolution conditions, the manually - designed rules are difficult to handle. 2. **Learning - based association methods**: Although these methods can learn high - level context information and thus better handle complex scenarios, they have higher complexity and cost. In particular, training these models requires a large amount of data, while the existing MOT datasets are of limited scale. To solve these problems, the author proposes a robust two - stage association tracker named RTAT (Robust Two - stage Association Tracker). The main innovations of this method are: - **First - stage association**: Associate between detection results and tracklets to generate high - purity tracklets. By setting a lower matching cost threshold, it is ensured that the generated tracklets have high purity, reducing the problem of identity switching. - **Second - stage association**: Associate between tracklets to form complete trajectories. This stage uses the Graph Neural Networks (GNN) framework to model tracklet association as a series of edge classification problems, recursively merging short tracklets into long trajectories. This method combines the efficiency of the manually - designed method and the strong generalization ability of the learning - based method, while reducing complexity and computational cost. Experimental results show that RTAT has achieved excellent performance in the MOT17 and MOT20 benchmarks, especially outstanding in key indicators such as HOTA, IDF1 and AssA. In summary, this paper aims to solve the trade - off problem between generalization ability and computational efficiency of existing MOT methods in complex scenarios by proposing a new two - stage association tracker.