Temporal-Spatial Feature Interaction Network for Multi-Drone Multi-Object Tracking

Han Wu,Hao Sun,Kefeng Ji,Gangyao Kuang
DOI: https://doi.org/10.1109/tcsvt.2024.3478758
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Multi-drone multi-object tracking (MDMOT) aims to localize and identify targets from videos captured simultaneously by multiple drones. To accomplish this task, existing methods typically follow the strategy of associating localized targets to obtain identities. However, their localization and identification stages heavily rely on single-frame information, resulting in the localization being very sensitive to visual information decay and making it struggle to capture discriminative representations for target identification. Consequently, they usually exhibit unreliable performance in challenging scenarios, such as occlusion and high similarity among targets. To this end, we introduce a novel MDMOT framework to interact temporal-spatial features, exploring the guidance of tracklet information across time and space. Specifically, we introduce temporal-spatial feedback loops to enrich cues in our tracker. Meanwhile, a novel temporal-oriented target localization is proposed to enhance the response to difficult samples in feature space by utilizing prior knowledge from existing tracklets beyond the current frame for target localization. Moreover, a spatial-oriented target identification is designed to synergize cross-drone information of tracklets, thereby providing discriminative representations for target identification. It combines target and background information to extract identity representations and interacts features from multiple drones. To our best knowledge, this work reports the first MDMOT system that synergizes features across multiple drones to track targets. By incorporating these two elaborated networks, we develop a robust tracker (named TSMMT). Extensive experiments on the MDMT public dataset demonstrate the superiority of our proposed model. Specifically, TSMMT outperforms state-of-the-art methods by 2.76%∼4.66% on MOTA and 2.06%∼3.33% on IDF1.
What problem does this paper attempt to address?