DynamicTrack: Advancing Gigapixel Tracking in Crowded Scenes

Yunqi Zhao,Yuchen Guo,Zheng Cao,Kai Ni,Ruqi Huang,Lu Fang
2024-07-26
Abstract:Tracking in gigapixel scenarios holds numerous potential applications in video surveillance and pedestrian analysis. Existing algorithms attempt to perform tracking in crowded scenes by utilizing multiple cameras or group relationships. However, their performance significantly degrades when confronted with complex interaction and occlusion inherent in gigapixel images. In this paper, we introduce DynamicTrack, a dynamic tracking framework designed to address gigapixel tracking challenges in crowded scenes. In particular, we propose a dynamic detector that utilizes contrastive learning to jointly detect the head and body of pedestrians. Building upon this, we design a dynamic association algorithm that effectively utilizes head and body information for matching purposes. Extensive experiments show that our tracker achieves state-of-the-art performance on widely used tracking benchmarks specifically designed for gigapixel crowded scenes.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problem of object tracking in crowd scenes in high - resolution (gigapixel) images. Specifically, the performance of existing multi - object tracking algorithms drops significantly when dealing with high - resolution images with complex interactions and severe occlusions. To address these issues, the authors propose the DynamicTrack framework, aiming to improve the tracking accuracy in crowded scenes by combining head and body information. ### Main Problems 1. **Complex Interactions and Severe Occlusions**: Crowd scenes in high - resolution images usually contain complex interaction behaviors and severe occlusion phenomena, which make traditional tracking algorithms difficult to work effectively. 2. **Limitations of Existing Methods**: - Multi - camera tracking methods are difficult to deal with the rigid segmentation problem of continuous space due to the dispersion of spatial information. - Although methods using group relationships can enhance robustness, there are challenges in capturing group relationships. ### Solutions To overcome the above challenges, the paper proposes the following solutions: 1. **Dynamic Detection Module (Dynamic Detection)**: - The authors design a dynamic detector based on contrastive learning, which can detect the heads and bodies of pedestrians simultaneously. This detector utilizes embedding learning technology and optimizes feature learning through the Associative Embedding Loss (AML). - The formulas are as follows: \[ L_{pull} = \mu (L_{bb}^{pull} + L_{hh}^{pull}) + \beta L_{bh}^{pull} \] \[ L_{push} = \mu (L_{bb}^{push} + L_{hh}^{push}) + \beta L_{bh}^{push} \] \[ \text{Loss AML} = \sigma L_{pull} + \tau L_{push} \] 2. **Dynamic Association Algorithm (Dynamic Association)**: - A dynamic association algorithm is proposed, which can make full use of head and body features for matching. This algorithm takes the body as the core and the head as the auxiliary, combines fine - grained local head features and global body information, thereby improving the robustness of tracking. - The dynamic association algorithm uses cascade matching technology to process the matched heads and bodies, unmatched bodies, and unmatched heads respectively, ensuring that information can be effectively utilized even in occluded environments. ### Experimental Results The experimental results show that DynamicTrack achieves state - of - the - art performance on multiple public datasets (such as MOT20 and PANDA). In particular, when dealing with complex crowded scenes in high - resolution images, DynamicTrack shows significant advantages. In conclusion, this paper successfully solves the problem of object tracking in crowd scenes in high - resolution images by introducing the DynamicTrack framework, especially performing excellently in the case of complex interactions and severe occlusions.