Self-supervised Multi-view Multi-Human Association and Tracking

Yiyang Gan,Ruize Han,Liqiang Yin,Wei Feng,Song Wang
DOI: https://doi.org/10.1145/3474085.3475177
2021-10-17
Abstract:Multi-view Multi-human association and tracking (MvMHAT) aims to track a group of people over time in each view, as well as to identify the same person across different views at the same time. This is a relatively new problem but is very important for multi-person scene video surveillance. Different from previous multiple object tracking (MOT) and multi-target multi-camera tracking (MTMCT) tasks, which only consider the over-time human association, MvMHAT requires to jointly achieve both cross-view and over-time data association. In this paper, we model this problem with a self-supervised learning framework and leverage an end-to-end network to tackle it. Specifically, we propose a spatial-temporal association network with two designed self-supervised learning losses, including a symmetric-similarity loss and a transitive-similarity loss, at each time to associate the multiple humans over time and across views. Besides, to promote the research on MvMHAT, we build a new large-scale benchmark for the training and testing of different algorithms. Extensive experiments on the proposed benchmark verify the effectiveness of our method. We have released the benchmark and code to the public.
What problem does this paper attempt to address?