Progressive Representation Learning for Real-Time UAV Tracking

Changhong Fu,Xiang Lei,Haobo Zuo,Liangliang Yao,Guangze Zheng,Jia Pan
2024-09-25
Abstract:Visual object tracking has significantly promoted autonomous applications for unmanned aerial vehicles (UAVs). However, learning robust object representations for UAV tracking is especially challenging in complex dynamic environments, when confronted with aspect ratio change and occlusion. These challenges severely alter the original information of the object. To handle the above issues, this work proposes a novel progressive representation learning framework for UAV tracking, i.e., PRL-Track. Specifically, PRL-Track is divided into coarse representation learning and fine representation learning. For coarse representation learning, two innovative regulators, which rely on appearance and semantic information, are designed to mitigate appearance interference and capture semantic information. Furthermore, for fine representation learning, a new hierarchical modeling generator is developed to intertwine coarse object representations. Exhaustive experiments demonstrate that the proposed PRL-Track delivers exceptional performance on three authoritative UAV tracking benchmarks. Real-world tests indicate that the proposed PRL-Track realizes superior tracking performance with 42.6 frames per second on the typical UAV platform equipped with an edge smart camera. The code, model, and demo videos are available at \url{<a class="link-external link-https" href="https://github.com/vision4robotics/PRL-Track" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper primarily aims to address several key challenges in UAV (Unmanned Aerial Vehicle) visual tracking, particularly the issues of aspect ratio changes and occlusion in complex dynamic environments. Specifically: 1. **Robust Object Representation Learning**: In complex dynamic environments, existing methods struggle to obtain robust object representations, especially when facing aspect ratio changes and occlusion. This paper proposes a novel Progressive Representation Learning framework (PRL-Track) designed to enhance UAV tracking performance through a coarse-to-fine learning strategy. 2. **Combining the Strengths of CNN and ViT**: Traditional CNN-based trackers can extract local spatial information but have limitations in handling global contextual information. Introducing ViT can compensate for this shortcoming, but ViT often overlooks local details. Therefore, this paper attempts to effectively combine CNN and ViT to fully leverage the advantages of both, achieving more reliable representation learning. 3. **Real-time Application Requirements**: Existing deep network structures like ResNet can better learn object representations but cannot meet the real-time tracking requirements of UAVs due to limited computational resources. The proposed PRL-Track not only improves tracking accuracy but also achieves efficient operation at 42.6 frames per second on typical UAV platforms. In summary, this paper aims to address the challenges of aspect ratio changes, occlusion, and other issues in UAV visual tracking through a novel Progressive Representation Learning framework, ensuring robustness and real-time performance in complex dynamic environments.