A Unified Multi-view Multi-person Tracking Framework

Fan Yang,Shigeyuki Odashima,Sosuke Yamao,Hiroaki Fujimoto,Shoichi Masui,Shan Jiang
DOI: https://doi.org/10.48550/arXiv.2302.03820
2023-02-08
Abstract:Although there is a significant development in 3D Multi-view Multi-person Tracking (3D MM-Tracking), current 3D MM-Tracking frameworks are designed separately for footprint and pose tracking. Specifically, frameworks designed for footprint tracking cannot be utilized in 3D pose tracking, because they directly obtain 3D positions on the ground plane with a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be robust to footprint tracking, since footprint tracking utilizes fewer key points than pose tracking, which weakens multi-view association cues in a single frame. This study presents a Unified Multi-view Multi-person Tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as the input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve the performance of association and triangulation. The effectiveness of our framework is verified by accomplishing state-of-the-art performance on the Campus and Shelf datasets for 3D pose tracking, and by comparable results on the WILDTRACK and MMPTRACK datasets for 3D footprint tracking.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address several key issues in multi-view multi-person tracking: 1. **Separation of existing frameworks**: Current multi-view multi-person tracking frameworks are typically designed separately for footprint tracking and pose tracking. Footprint tracking frameworks cannot be directly applied to pose tracking because they rely on homography projection on the ground plane to obtain location information, which is not applicable to aerial poses. Conversely, pose tracking frameworks usually isolate multi-view and multi-frame associations and may not be robust enough in footprint tracking because footprint tracking uses fewer keypoints than pose tracking, weakening multi-view association cues within a single frame. 2. **Effective utilization of multi-view and multi-frame information**: Existing methods lack in the joint utilization of multi-view and multi-frame information, leading to limited tracking performance in complex scenarios. For example, footprint tracking methods typically assume individuals are on flat ground, while pose tracking methods struggle to handle the correct association of multi-view features within a single frame. 3. **Real-time performance and robustness**: Current multi-view multi-person tracking methods often lack real-time performance and robustness when dealing with large-scale real-world problems, especially when facing challenges such as occlusion, noise, and outliers. To address these issues, the paper proposes a unified multi-view multi-person tracking framework that can: - **Integrate the advantages of footprint tracking and pose tracking**: By using a unified framework, it can handle multi-view tracking of both footprints and poses simultaneously without additional modifications. - **Jointly utilize multi-frame and multi-view information**: By processing video online with a sliding window technique, it first connects 2D positions into 2D trajectories, then calculates cross-view consistency distances, and uses normalized epipolar distances to enhance multi-view consistency. - **Robust clustering and triangulation**: Introduces a non-parametric clustering method based on Propagated Distance Non-parametric Clustering (PDNC) to handle the association problem between cross-view 2D trajectories. Additionally, it proposes a Collaborative Multi-frame Multi-view Triangulation (CMMT) method to compute 3D positions and eliminate outliers. - **Online processing and real-time performance**: By using an online processing framework, it reduces the impact of trajectory ID switches in each monocular view and achieves high real-time performance. In summary, the paper aims to improve the robustness and real-time performance of multi-view multi-person tracking through a unified framework, suitable for both footprint and pose tracking tasks.