Learning Task-Specific Discriminative Representations for Multiple Object Tracking
Han Wu,Jiahao Nie,Ziming Zhu,Zhiwei He,Mingyu Gao
DOI: https://doi.org/10.1007/s00521-022-08079-3
2022-01-01
Abstract:One-shot multiple object tracking (MOT), which learns object detection and identity embedding in a unified network, has attracted increasing attention due to its low complexity and high tracking speed. However, most one-shot trackers ignore that detection and re-identification (ReID) require different representations of features. The inherent difference between these two subtasks leads to optimization contradictions in the training procedure. This issue would result in suboptimal tracking performance. To alleviate this contradiction, we propose a novel dual-path transformation network (DTN) that decouples the shared features into detection-specific and ReID-specific representations. By learning task-specific features, this module satisfies the different requirements of both subtasks. Moreover, we observe that previous trackers generally utilize local information to distinguish targets and ignore global semantic relations, which are crucial for tracking. Therefore, we design a pyramid non-local network (PNN) that allows our network to explore pixel-to-pixel relations with a global receptive field. Meanwhile, PNN considers the scale information to enhance the robustness to scale variations. Extensive experiments conducted on three benchmarks, i.e., MOT16, MOT17, and MOT20, demonstrate the superiority of our tracker, namely DPTrack. The experimental results reveal that DPTrack achieves state-of-the-art performance, e.g., MOTA of 77.1 % and IDF1 of 74.9 % on MOT17. Moreover, DPTrack runs at 14.9FPS, and our lightweight version runs at 26.6FPS with only a slight performance decay.