Abstract:Although there is a significant development in 3D Multi-view Multi-person Tracking (3D MM-Tracking), current 3D MM-Tracking frameworks are designed separately for footprint and pose tracking. Specifically, frameworks designed for footprint tracking cannot be utilized in 3D pose tracking, because they directly obtain 3D positions on the ground plane with a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be robust to footprint tracking, since footprint tracking utilizes fewer key points than pose tracking, which weakens multi-view association cues in a single frame. This study presents a Unified Multi-view Multi-person Tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as the input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve the performance of association and triangulation. The effectiveness of our framework is verified by accomplishing state-of-the-art performance on the Campus and Shelf datasets for 3D pose tracking, and by comparable results on the WILDTRACK and MMPTRACK datasets for 3D footprint tracking.

What problem does this paper attempt to address?

The paper attempts to address several key issues in multi-view multi-person tracking: 1. **Separation of existing frameworks**: Current multi-view multi-person tracking frameworks are typically designed separately for footprint tracking and pose tracking. Footprint tracking frameworks cannot be directly applied to pose tracking because they rely on homography projection on the ground plane to obtain location information, which is not applicable to aerial poses. Conversely, pose tracking frameworks usually isolate multi-view and multi-frame associations and may not be robust enough in footprint tracking because footprint tracking uses fewer keypoints than pose tracking, weakening multi-view association cues within a single frame. 2. **Effective utilization of multi-view and multi-frame information**: Existing methods lack in the joint utilization of multi-view and multi-frame information, leading to limited tracking performance in complex scenarios. For example, footprint tracking methods typically assume individuals are on flat ground, while pose tracking methods struggle to handle the correct association of multi-view features within a single frame. 3. **Real-time performance and robustness**: Current multi-view multi-person tracking methods often lack real-time performance and robustness when dealing with large-scale real-world problems, especially when facing challenges such as occlusion, noise, and outliers. To address these issues, the paper proposes a unified multi-view multi-person tracking framework that can: - **Integrate the advantages of footprint tracking and pose tracking**: By using a unified framework, it can handle multi-view tracking of both footprints and poses simultaneously without additional modifications. - **Jointly utilize multi-frame and multi-view information**: By processing video online with a sliding window technique, it first connects 2D positions into 2D trajectories, then calculates cross-view consistency distances, and uses normalized epipolar distances to enhance multi-view consistency. - **Robust clustering and triangulation**: Introduces a non-parametric clustering method based on Propagated Distance Non-parametric Clustering (PDNC) to handle the association problem between cross-view 2D trajectories. Additionally, it proposes a Collaborative Multi-frame Multi-view Triangulation (CMMT) method to compute 3D positions and eliminate outliers. - **Online processing and real-time performance**: By using an online processing framework, it reduces the impact of trajectory ID switches in each monocular view and achieves high real-time performance. In summary, the paper aims to improve the robustness and real-time performance of multi-view multi-person tracking through a unified framework, suitable for both footprint and pose tracking tasks.

A Unified Multi-view Multi-person Tracking Framework

A Multi-Modal Fusion-Based 3D Multi-Object Tracking Framework with Joint Detection

Multi-person Multi-Camera Tracking for Live Stream Videos Based on Improved Motion Model and Matching Cascade

2D-3D Pose Tracking with Multi-View Constraints

Real-Time Multiple Pedestrians Tracking in Multi-camera System

Part-Aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking

MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving

Multi-Person Articulated Tracking With Spatial and Temporal Embeddings

A Multi-View Pedestrian Tracking Framework Based on Graph Matching

Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking

Single/cross-camera multiple-person tracking by graph matching

A UNIFIED FRAMEWORK FOR JOINT VIDEO PEDESTRIAN SEGMENTATION AND POSE TRACKING

Skeleton Cluster Tracking for robust multi-view multi-person 3D human pose estimation

MMF-Track: Multi-modal Multi-level Fusion for 3D Single Object Tracking

VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the Wild

Generic Multiview Visual Tracking.

Know Your Surroundings: Panoramic Multi-Object Tracking by Multimodality Collaboration

You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking

An End-to-end Tracking Framework Via Multi-View and Temporal Feature Aggregation

A Multi-body Tracking Framework - From Rigid Objects to Kinematic Structures

Multi-View Object Tracking for Motion Capture