Abstract:Drone aerial imaging has become increasingly important across numerous fields as drone optical sensor technology continues to advance. One critical challenge in this domain is achieving both accurate and efficient multi-object tracking. Traditional deep learning methods often separate object identification from tracking, leading to increased complexity and potential performance degradation. Conventional approaches rely heavily on manual feature engineering and intricate algorithms, which can further limit efficiency. To overcome these limitations, we propose a novel Transformer-based end-to-end multi-object tracking framework. This innovative method leverages self-attention mechanisms to capture complex inter-object relationships, seamlessly integrating object detection and tracking into a unified process. By utilizing end-to-end training, our approach simplifies the tracking pipeline, leading to significant performance improvements. A key innovation in our system is the introduction of a trajectory detection label matching technique. This technique assigns labels based on a comprehensive assessment of object appearance, spatial characteristics, and Gaussian features, ensuring more precise and logical label assignments. Additionally, we incorporate cross-frame self-attention mechanisms to extract long-term object properties, providing robust information for stable and consistent tracking. We further enhance tracking performance through a newly developed self-characteristics module, which extracts semantic features from trajectory information across both current and previous frames. This module ensures that the long-term interaction modules maintain semantic consistency, allowing for more accurate and continuous tracking over time. The refined data and stored trajectories are then used as input for subsequent frame processing, creating a feedback loop that sustains tracking accuracy. Extensive experiments conducted on the VisDrone and UAVDT datasets demonstrate the superior performance of our approach in drone-based multi-object tracking.

Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

Accurate and Real-Time 3-D Tracking for the Following Robots by Fusing Vision and Ultrasonar Information

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

InterTrack: Interaction Transformer for 3D Multi-Object Tracking

Object-Level Pseudo-3D Lifting for Distance-Aware Tracking

MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving

Therapierefraktäre Angina pectoris im Endstadium der koronaren Herzkrankheit Neuromodulation als Chance?

STT: Stateful Tracking with Transformers for Autonomous Driving

HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision

CXTrack: Improving 3D Point Cloud Tracking with Contextual Information

Real-time 3D Human Tracking for Mobile Robots with Multisensors

Probabilistic 3D Multi-Modal, Multi-Object Tracking for Autonomous Driving

An Effective Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

End-to-end multiple object tracking in high-resolution optical sensors of drones with transformer models

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Optical coherence tomography findings in multiple evanescent white dot syndrome.

Real-Time 3D Single Object Tracking With Transformer

Visible and Infrared Object Tracking Based on Multimodal Hierarchical Relationship Modeling

3D Multi-Object Tracking with Semi-Supervised GRU-Kalman Filter