Abstract:Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision. Current MOT trackers rely on accurate object detection results and precise matching of target reidentification (ReID). These methods focus on optimizing target spatial attributes while overlooking temporal cues in modelling object relationships, especially for challenging tracking conditions such as object deformation and blurring, etc. To address the above-mentioned issues, we propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT), which utilizes historical embedding features to model the representation of ReID and detection features in a sequential order. Concretely, a temporal embedding boosting module is introduced to enhance the discriminability of individual embedding based on adjacent frame cooperation. While the trajectory embedding is then propagated by a temporal detection refinement module to mine salient target locations in the temporal field. Extensive experiments on the VisDrone2019 and UAVDT datasets demonstrate our STCMOT sets a new state-of-the-art performance in MOTA and IDF1 metrics. The source codes are released at <a class="link-external link-https" href="https://github.com/ydhcg-BoBo/STCMOT" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key challenges in the multiple - object - tracking (MOT) task based on unmanned - aerial - vehicle (UAV) videos. Specifically, the author points out that current MOT methods mainly rely on accurate object - detection results and precise object re - identification (ReID), but often overlook temporal cues when modeling object relationships, especially performing poorly in the face of complex tracking conditions such as object deformation and blurring. To address these challenges, the author proposes a novel spatio - temporal - cohesion - multiple - object - tracking (STCMOT) framework. This framework enhances the robustness and accuracy of object tracking by using historical embedding features to model the sequential representation of ReID and detection features. ### Main problems and solutions 1. **Optimization of spatial attributes while ignoring temporal cues**: - Current methods ignore the importance of temporal cues for modeling object relationships when optimizing the spatial attributes of objects. - **Solution**: The temporal - embedding - boosting - module (TEBM) is introduced. It generates channel - level descriptors by combining the ReID feature maps of adjacent frames to highlight the distinctiveness of individual embeddings. 2. **Performance degradation under complex tracking conditions**: - Under complex conditions such as object deformation and blurring, the performance of existing methods is easily affected. - **Solution**: The temporal - detection - refinement - module (TDRM) is designed. It improves detection performance by propagating trajectory embeddings and mining significant object positions in the time domain. 3. **Resource consumption and efficiency issues**: - Traditional two - stage tracking frameworks require different networks for object detection and embedding extraction respectively, resulting in high storage costs and large resource consumption. - **Solution**: STCMOT adopts a one - shot tracking framework, integrating the detection branch and the ReID branch into a unified framework, balancing tracking performance and speed. ### Experimental results The experimental results show that STCMOT achieves new state - of - the - art performance on the VisDrone2019 and UAVDT datasets and performs excellently in both MOTA and IDF1 metrics. This proves the effectiveness and superiority of STCMOT in handling the multiple - object - tracking task based on UAV videos. ### Summary This paper solves the problem of insufficient performance of existing MOT methods in complex scenarios by introducing spatio - temporal cues and enhancing feature representation, providing a more efficient and accurate solution for multiple - object - tracking in UAV videos.

STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

Online Multi-Object Tracking from A Bird's-Eye View by Fusion of Millimeter-Wave Radar and Vision

Uncertainty-aware Unsupervised Multi-Object Tracking

MAT: Motion-Aware Multi-Object Tracking

Multiple Object Tracking of Drone Videos by a Temporal-Association Network with Separated-Tasks Structure

Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking

Multi-Object Tracking Meets Moving UAV

DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects

Multi-object tracking with deep learning ensemble for unmanned aerial system applications

CSCMOT: Multi-object tracking based on channel spatial cooperative attention mechanism

CAMO-MOT: Combined Appearance-Motion Optimization for 3D Multi-Object Tracking With Camera-LiDAR Fusion

View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV

ST-TrackNet: A Multiple-Object Tracking Network Using Spatio-Temporal Information

MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking

[Significance of cardiovascular research within the scope of the total development of medical sciences in East Germany].

STURE: Spatial-Temporal Mutual Representation Learning for Robust Data Association in Online Multi-Object Tracking

Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism

Prevention of deep vein thrombosis and pulmonary embolism following surgery.

Towards Real-Time Multi-Object Tracking