Abstract:We propose a graph-based tracking formulation for multi-object tracking (MOT) where target detections contain kinematic information and re-identification features (attributes). Our method applies a successive shortest paths (SSP) algorithm to a tracking graph defined over a batch of frames. The edge costs in this tracking graph are computed via a message-passing network, a graph neural network (GNN) variant. The parameters of the GNN, and hence, the tracker, are learned end-to-end on a training set of example ground-truth tracks and detections. Specifically, learning takes the form of bilevel optimization guided by our novel loss function. We evaluate our algorithm on simulated scenarios to understand its sensitivity to scenario aspects and model hyperparameters. Across varied scenario complexities, our method compares favorably to a strong baseline.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the data association problem in multi-object tracking (MOT). Specifically, it proposes a method based on graph neural networks (GNN) and bilevel optimization to handle target detection data that includes motion information and re-identification features (such as attributes). The main contributions of the paper include: 1. **End-to-end learnable graph-based tracking method**: A new end-to-end learnable graph-based tracking method is proposed, which has lower computational cost than existing methods. 2. **Using the Successive Shortest Path (SSP) algorithm for inner optimization**: By applying the SSP algorithm on the tracking graph, it ensures the global optimality of the tracking results and satisfies tracking constraints. 3. **Quantitative analysis**: The algorithm is quantitatively analyzed in various synthetic scenarios and compared with strong baseline methods based on GNN but without global path optimization. ### Background and Challenges Multi-object tracking (MOT) is a necessary step in many practical applications, such as pedestrian tracking in autonomous driving, animal and bird tracking in environmental research, and player tracking in team sports. Although in some scenarios, detection data includes features that help associate target identities and distinguish targets from clutter (such as re-identification features), MOT remains a challenging problem. The main challenges include: - **Utilization of high-dimensional features**: How to effectively utilize feature vectors of arbitrary dimensions (which may be high-dimensional). - **Joint inference**: How to perform joint and optimal inference considering both attributes and target dynamics. ### Method Overview The paper proposes a MOT method based on graph neural networks (GNN) and bilevel optimization. The specific steps are as follows: 1. **Construct detection graph and tracking graph**: - **Detection Graph**: Construct a detection graph based on detection data within a time window, where nodes correspond to detection data, and edges represent the association between detections. - **Tracking Graph**: Construct a tracking graph based on the detection graph, where each detection node corresponds to a pair of "twin" nodes in the tracking graph, with source and terminal nodes representing the start and end of a trajectory. 2. **Calculate edge costs**: - Use a message passing network (MPN) to calculate the cost of each edge in the detection graph. MPN is a variant of graph neural networks that updates node and edge embeddings through multi-layer message passing. - Copy the calculated edge costs from the detection graph to the tracking graph. 3. **Bilevel optimization**: - **Inner optimization**: Use the Successive Shortest Path (SSP) algorithm to find the globally optimal trajectories on the tracking graph. - **Outer optimization**: Define a loss function to learn GNN parameters to minimize the difference between predicted trajectories and ground truth trajectories. The loss function includes two parts: one is the difference between the cost of predicted trajectories and the cost of ground truth trajectories, and the other is a term to keep the cost of ground truth trajectories negative. ### Experiments and Evaluation The paper conducts extensive experiments on synthetic datasets to verify the effectiveness and robustness of the proposed method. The experimental results show that the proposed method outperforms strong baseline methods in various complex scenarios. Specific evaluation metrics include: - **Multi-object tracking accuracy (MOTA)**: Measures the overall performance of the tracking algorithm. - **GOSPA and SIAP metrics**: Quantify localization error and cardinality error, as well as completeness, ambiguity, false trajectories, and position error. ### Conclusion The paper proposes a new multi-object tracking method based on graph neural networks and bilevel optimization, ensuring the global optimality of tracking results by using the Successive Shortest Path algorithm. Experimental results show that the method performs excellently in various complex scenarios, especially in handling high-dimensional features and noisy data.

SSP-GNN: Learning to Track via Bilevel Optimization

Exploit the Connectivity: Multi-Object Tracking with TrackletNet

Exploit the Connectivity

Split and Connect: A Universal Tracklet Booster for Multi-Object Tracking

Track Without Appearance: Learn Box and Tracklet Embedding with Local and Global Motion Patterns for Vehicle Tracking

Learning a Neural Solver for Multiple Object Tracking

Graph Networks for Multiple Object Tracking

Tracklets Predicting Based Adaptive Graph Tracking

Object-Level Pseudo-3D Lifting for Distance-Aware Tracking

Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking

SCGTracker: object feature embedding enhancement based on graph attention networks for multi-object tracking

Learnable Online Graph Representations for 3D Multi-Object Tracking

Enhancing the association in multi‐object tracking via neighbor graph

Multi-Object Tracking and Segmentation via Neural Message Passing

HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision

Multi-object Tracking by Expanding Long-Tracklets

Part-Based Multi-Graph Ranking for Visual Tracking

Detection Recovery in Online Multi-Object Tracking with Sparse Graph Tracker

Learning of Global Objective for Network Flow in Multi-Object Tracking

MLGT: multi-local guided tracker for visual object tracking

TGCN: Time Domain Graph Convolutional Network for Multiple Objects Tracking