Learnable Prompts-Based Transformers for Domain Generalization of Hyperspectral Image Classification

Baofa He,Jie Feng,Di Li,Ronghua Shang,Jinjian Wu,Licheng Jiao
DOI: https://doi.org/10.1109/igarss53475.2024.10640533
2024-01-01
Abstract:Extensive pre-trained visual-language alignment models, such as Contrastive Language-Image Pre-training (CLIP), have demonstrated significant potential for learning representations transferable to domain generation tasks. In hyperspectral image (HSI) classification, a major challenge in deploying such models lies in prompt engineering, which requires particular expertise and substantial time investment. Moreover, existing methods ignore correlation information cross spectral bands. To address these issues, a novel method named learnable prompts-based Transformer (LPFormer) is proposed in this paper. In LPFormer, cross-band correlation information is extracted by self-attention of the transformer, which converted into positional embedding within the transformer framework to obtain the visual features. Subsequently, prompt words are modeled using learnable parameters that turn into efficient expertise. Finally, contrast learning method is used to align visual and textual features. Experimental results on two HSI datasets shows that the proposed LPFormer outperforms other domain adaptation methods.
What problem does this paper attempt to address?