Transformer Tracking for Satellite Video: Matching, Propagation, and Prediction
Manqi Zhao,Shengyang Li,Jian Yang
DOI: https://doi.org/10.1109/tgrs.2024.3501380
IF: 8.2
2024-11-29
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Recently, transformer-based trackers have brought overwhelming advantages in general video. However, their performance in satellite video has been hindered by insufficient satellite-specific training and a lack of designs tailored to satellite targets and scene characteristics. To tackle these challenges, we propose a novel transformer-based tracking framework for satellite video object tracking: Transformer Matching, Propagation, and Prediction (TransMPP). TransMPP combines three stages: static matching, dynamic propagation, and prediction, to ensure accurate tracking in satellite videos. Specifically, the Matching model uses a one-stream pipeline for simultaneous feature extraction and relationship modeling across extensive search and template areas, thereby improving foreground and background discrimination capabilities. In addition, the Propagation and Prediction models enhance temporal modeling capabilities through local long-term and short-term feature propagation and global sequence prediction, respectively, boosting tracking robustness. Moreover, to ensure a fair comparison and evaluation, we also developed SatSOT-train, a large-scale training dataset for the SatSOT benchmark. After comprehensive training, TransMPP demonstrates state-of-the-art (SOTA) performance on the SatSOT dataset, achieving an area under the curve (AUC) score of 59.9% and a precision score of 71.5%, bringing improvements of 6.3% and 5.3%, respectively. The code will be available at https://github.com/DonDominic/TransMPP.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics