Object-Level Pseudo-3D Lifting for Distance-Aware Tracking
Haoyuan Jin,Xuesong Nie,Yunfeng Yan,Xi Chen,Zhihang Zhu,Donglian Qi
DOI: https://doi.org/10.1145/3664647.3680783
2024-01-01
Abstract:Multi-object tracking (MOT) is a pivotal task for media interpretation, where reliable motion and appearance cues are essential for cross-frame identity preservation. However, limited by the inherent perspective properties of 2D space, the crowd density and frequent occlusions in real-world scenes expose the fragility of these cues. We observe the natural advantage of objects being well-separated in high-dimensional space and propose a novel 2D MOT framework, "Detecting-Lifting-Tracking'' (DLT). Initially, a pre-trained detector is employed to capture 2D object information. Secondly, we introduce a Mamba Distance Estimator to obtain the distances of objects to a monocular camera with temporal consistency, achieving object-level pseudo-3D lifting. Finally, we thoroughly explore distance-aware tracking via pseudo-3D information. Specifically, we introduce a Score-Distance Hierarchical Matching and Short-Long Terms Association to enhance accurate and robust association capability. Even without appearance cues, our DLT achieves state-of-the-art performance on MOT17, MOT20, and DanceTrack, demonstrating its potential to address occlusion challenges.