Domain Adaptation-aware Transformer for Hyperspectral Object Tracking

Yinan Wu,Licheng Jiao,Xu Liu,Fang Liu,Shuyuan Yang,Lingling Li
DOI: https://doi.org/10.1109/tcsvt.2024.3385273
2024-01-01
Abstract:Visual object tracking in natural scenes is a popular but challenging task, owing to the difficulties of feature representation from various changes of the targets, such as size change, deformation, illumination change, rotations, motion blur, background clutter, etc. High-speed hyperspectral imaging systems capture hyperspectral videos (HSVs) in wide spectral ranges and provide abundant spectral and spatial information to tell targets apart from backgrounds, alleviating the model drift in appearance-based tracking methods. However, different hyperspectral imagers, such as near-infrared (NIR), red-to-near-infrared (RedNIR), and visible (VIS), obtain heterogeneous types of data that could not be handled by common object trackers. In this paper, a domain adaptive Transformer framework is proposed for hyperspectral object tracking. Considering the HSVs are from different types of sensors, their heterogeneous features are learned in an adversarial way by domain label reverse learning with a gradient reversed layer. To fully utilize the spectral information in HSV frames, a band-wise spatial attention module (BSAM) is designed to emphasize the salient area near the target of interest. We adopt a Siamese-like Transformer tracker as the main structure for tracking. Our tracker outperforms top-ranking methods on a hyperspectral object tracking benchmark dataset containing three types, 87 hyperspectral videos in total. The comparison experiments validate the effectiveness of the proposed method. The source code and trained models of this work will be publicly available soon at https://github.com/LianYi233/Trans-DAT.
What problem does this paper attempt to address?