SSTtrack: A Unified Hyperspectral Video Tracking Framework via Modeling Spectral-Spatial-Temporal Conditions
Yuzeng Chen,Qiangqiang Yuan,Yuqi Tang,Yi Xiao,Jiang He,Te Han,Zhenqi Liu,Liangpei Zhang
DOI: https://doi.org/10.2139/ssrn.4860918
IF: 18.6
2024-09-02
Information Fusion
Abstract:Hyperspectral video contains rich spectral, spatial, and temporal conditions that are crucial for capturing complex object variations and overcoming the inherent limitations (e.g., multi-device imaging, modality alignment, and finite spectral bands) of regular RGB and multi-modal video tracking. However, existing hyperspectral tracking methods frequently encounter issues including data anxiety, band gap, huge volume, and weakness of the temporal condition embedded in video sequences, which result in unsatisfactory tracking capabilities. To tackle the dilemmas, we present a unified hyperspectral video tracking framework via modeling spectral-spatial-temporal conditions end-to-end, dubbed SSTtrack. First, we design the multi-modal generation adapter (MGA) to explore the interpretability benefits of combining physical and machine models for learning the multi-modal generation and bridging the band gap. To dynamically transfer and interact with multiple modalities, we then construct a novel spectral-spatial adapter (SSA). Finally, we design a temporal condition adapter (TCA) for injecting the temporal condition to guide spectral and spatial feature representations to capture static and instantaneous object properties. SSTtrack follows the prompt learning paradigm with the addition of few trainable parameters (0.575M), resulting in superior performance in extensive comparisons. The code will be released at https://github.com/YZCU/SSTtrack .
computer science, artificial intelligence, theory & methods