Addressing single object tracking in satellite imagery through prompt-engineered solutions

Athena Psalta,Vasileios Tsironis,Andreas El Saer,Konstantinos Karantzalos
2024-07-08
Abstract:Object tracking in satellite videos remains a complex endeavor in remote sensing due to the intricate and dynamic nature of satellite imagery. Existing state-of-the-art trackers in computer vision integrate sophisticated architectures, attention mechanisms, and multi-modal fusion to enhance tracking accuracy across diverse environments. However, the challenges posed by satellite imagery, such as background variations, atmospheric disturbances, and low-resolution object delineation, significantly impede the precision and reliability of traditional Single Object Tracking (SOT) techniques. Our study delves into these challenges and proposes prompt engineering methodologies, leveraging the Segment Anything Model (SAM) and TAPIR (Tracking Any Point with per-frame Initialization and temporal Refinement), to create a training-free point-based tracking method for small-scale objects on satellite videos. Experiments on the VISO dataset validate our strategy, marking a significant advancement in robust tracking solutions tailored for satellite imagery in remote sensing applications.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to address the problem of Single Object Tracking (SOT) in satellite videos. Specifically, due to factors such as background changes, atmospheric interference, and low-resolution object contours, traditional single object tracking techniques face significant challenges in terms of accuracy and reliability. Although some existing computer vision trackers integrate complex architectures, attention mechanisms, and multi-modal fusion techniques, they still struggle to cope with these complex situations in satellite imagery. To overcome these issues, the paper proposes a prompt engineering-based approach that leverages pre-trained models such as the Segment Anything Model (SAM) and TAPIR (Tracking Any Point with per-frame Initialization and temporal Refinement) to achieve a training-free point-based tracking method specifically designed for small-scale object tracking in satellite videos. Experimental validation shows that this method has made significant progress on the VISO dataset, demonstrating a robust tracking solution for satellite remote sensing applications.