FAMNet: Joint Learning of Feature, Affinity and Multi-dimensional Assignment for Online Multiple Object Tracking

Peng Chu,Haibin Ling
DOI: https://doi.org/10.48550/arXiv.1904.04989
2019-04-10
Abstract:Data association-based multiple object tracking (MOT) involves multiple separated modules processed or optimized differently, which results in complex method design and requires non-trivial tuning of parameters. In this paper, we present an end-to-end model, named FAMNet, where Feature extraction, Affinity estimation and Multi-dimensional assignment are refined in a single network. All layers in FAMNet are designed differentiable thus can be optimized jointly to learn the discriminative features and higher-order affinity model for robust MOT, which is supervised by the loss directly from the assignment ground truth. We also integrate single object tracking technique and a dedicated target management scheme into the FAMNet-based tracking system to further recover false negatives and inhibit noisy target candidates generated by the external detector. The proposed method is evaluated on a diverse set of benchmarks including MOT2015, MOT2017, KITTI-Car and UA-DETRAC, and achieves promising performance on all of them in comparison with state-of-the-arts.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to address is the data association issue in Multiple Object Tracking (MOT). Traditional MOT methods usually handle the three modules of feature extraction, affinity estimation, and Multi-Dimensional Assignment (MDA) separately, leading to complex method design and requiring extensive parameter tuning. This paper proposes an end-to-end model called FAMNet, which integrates the three modules of feature extraction, affinity estimation, and multi-dimensional assignment into a single network for joint optimization, thereby improving the robustness and performance of MOT. Specifically, FAMNet addresses the problem in the following ways: 1. **Feature Subnetwork**: Used to extract features from candidate objects in each frame. 2. **Affinity Subnetwork**: Utilizes the extracted features to estimate the high-order affinity of all hypothesized trajectories. 3. **Multi-Dimensional Assignment Subnetwork**: Based on the affinity tensor, it obtains the optimal assignment results by optimizing the global multi-dimensional assignment. Additionally, FAMNet integrates single object tracking techniques and target management schemes to further recover missed targets and suppress noisy candidate objects generated by external detectors. Experimental results show that this method achieves superior performance compared to existing methods on multiple benchmark datasets (such as MOT2015, MOT2017, KITTI-Car, and UA-DETRAC).