MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark

Sanghyun Woo,Kwanyong Park,Inkyu Shin,Myungchul Kim,In So Kweon
2024-03-29
Abstract:Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras. This task has practical applications in various fields, such as visual surveillance, crowd behavior analysis, and anomaly detection. However, due to the difficulty and cost of collecting and labeling data, existing datasets for this task are either synthetically generated or artificially constructed within a controlled camera network setting, which limits their ability to model real-world dynamics and generalize to diverse camera configurations. To address this issue, we present MTMMC, a real-world, large-scale dataset that includes long video sequences captured by 16 multi-modal cameras in two different environments - campus and factory - across various time, weather, and season conditions. This dataset provides a challenging test-bed for studying multi-camera tracking under diverse real-world complexities and includes an additional input modality of spatially aligned and temporally synchronized RGB and thermal cameras, which enhances the accuracy of multi-camera tracking. MTMMC is a super-set of existing datasets, benefiting independent fields such as person detection, re-identification, and multiple object tracking. We provide baselines and new learning setups on this dataset and set the reference scores for future studies. The datasets, models, and test server will be made publicly available.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges of multi - target multi - camera tracking (MTMCT) in practical applications. Specifically, the existing multi - target multi - camera tracking datasets are either synthetically generated through game simulators or small - scale real - data collected in a controlled camera network environment. These datasets cannot well simulate the dynamic environment of the real world, resulting in their limited generalization ability under diverse camera configurations. Moreover, due to the high cost of data collection and annotation, high - quality real - world datasets are very scarce. To solve these problems, the paper proposes a new benchmark dataset named MTMMC (Multi - Target Multi - Modal Camera Tracking). This dataset contains long - term video sequences captured by 16 multi - modal cameras installed in two different environments (campus and factory), and these videos cover different time, weather and season conditions. The MTMMC dataset not only provides a challenging test platform for multi - camera tracking in complex real - world environments, but also introduces spatially - aligned and temporally - synchronized RGB and thermal imaging cameras as additional input modalities to improve the accuracy of multi - camera tracking. The main contributions of the paper include: 1. **Large - scale real - world dataset**: MTMMC is currently the largest publicly available multi - target multi - camera tracking dataset, containing 3,052,800 frames of high - resolution video, covering indoor and outdoor scenes. 2. **Multi - modal data**: It provides the combination of RGB and thermal imaging data for the first time, which is helpful for studying the influence of multi - modal learning in multi - camera tracking. 3. **Diverse environmental conditions**: The dataset covers different time, weather and season conditions, ensuring the diversity and representativeness of the data. 4. **High - quality annotation**: Through a semi - automatic annotation system and manual verification, the accuracy and consistency of data annotation are ensured. In conclusion, the MTMMC dataset aims to provide a test platform closer to the real world for the research of multi - target multi - camera tracking and promote the technological progress in related fields.