Aggregate Tracklet Appearance Features for Multi-Object Tracking

Long Chen,Haizhou Ai,Rui Chen,Zijie Zhuang
DOI: https://doi.org/10.1109/lsp.2019.2940922
2019-01-01
IEEE Signal Processing Letters
Abstract:Multi-object tracking (MOT) has wide applications in the fields of video analysis and signal processing. A major challenge in MOT is how to associate the noisy detections into long and continuous trajectories. In this letter, we address the association problem at the tracklet-level, and mainly focus on the appearance representation designed for tracklets. A multitask convolutional neural network is proposed to learn the discriminative features and spatial-temporal attentions jointly. In particular, we decompose an object in a static image with spatial attentions, and then aggregate multiple features in a tracklet based on the temporal attentions. Appearance misalignment that caused by occlusion and inaccurate bounding is then mitigated by multi-feature aggregation. Experimental results on two challenging MOT benchmarks have demonstrated the effectiveness of the proposed method and shown significant improvement on the quality of tracking identities.
What problem does this paper attempt to address?