Near-Online Multi-Pedestrian Tracking via Combining Multiple Consistent Appearance Cues
Weijiang Feng,Long Lan,Yong Luo,Yue Yu,Xiang Zhang,Zhigang Luo
DOI: https://doi.org/10.1109/tcsvt.2020.3005662
IF: 5.859
2021-04-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:An important cue for multi-pedestrian tracking in video is the consistent appearance of an individual for quite a while. In this paper, we address multi-pedestrian tracking by learning a robust appearance model from the paradigm of tracking by detection. To separate detections of different pedestrians while assembling detections of the same pedestrian, we take advantage of the cue of consistent appearance and exploit three types of evidence from the recent, past and near-future. Existing online approaches only exploit the detection-to-detection and sequence-to-detection metrics, which focus on the recent and past appearance patterns respectively, while the future pedestrian appearance is simply ignored. This drawback is remedied in this paper by further considering the sequence-to-sequence metric, which resorts to near-future appearance presentation. Adaptive combination weights are learned to fuse these three different metrics. Moreover, we propose a novel Focal Triplet Loss to make the model focus more on hard examples than the easy ones. We demonstrate that this can significantly enhance the discriminating power of the model compared with treating every sample equally. Effectiveness and efficiency of the proposed method is verified by conducting comprehensive ablation studies and comparing with many competitive (offline/online/near-online) counterparts on the MOT16 and MOT17 Challenges.
engineering, electrical & electronic