Foundation Model-Based Spectral–Spatial Transformer for Hyperspectral Image Classification

Lingbo Huang,Yushi Chen,Xin He
DOI: https://doi.org/10.1109/tgrs.2024.3456129
IF: 8.2
2024-09-21
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Recently, deep learning models have dominated hyperspectral image (HSI) classification. Nowadays, deep learning is undergoing a paradigm shift with the rise of transformer-based foundation models. In this study, the potential of transformer-based foundation models, including the vision foundation model (VFM) and language foundation model (LFM), for HSI classification are investigated. First, to improve the performance of traditional HSI classification tasks, a spectral-spatial VFM-based transformer (SS-VFMT) is proposed, which inserts spectral-spatial information into the pretrained foundation transformer. Specifically, a given pretrained transformer receives HSI patch tokens for long-range feature extraction benefiting from the prelearned weights. Meanwhile, two enhancement modules, i.e., spatial and spectral enhancement modules (SpaEMs SpeEMs), utilize spectral and spatial information for steering the behavior of the transformer. Besides, an additional patch relationship distillation strategy is designed for SS-VFMT to exploit the pretrained knowledge better, leading to the proposed SS-VFMT-D. Second, based on SS-VFMT, to address a new HSI classification task, i.e., generalized zero-shot classification, a spectral-spatial vision-language-based transformer (SS-VLFMT) is proposed. This task is to recognize novel classes not seen during training, which is more meaningful as the real world is usually open. The SS-VLFMT leverages SS-VFMT to extract spectral-spatial features and corresponding hash codes while integrating a pretrained language model to extract text features from class names. Experimental results on HSI datasets reveal that the proposed methods are competitive compared to the state-of-the-art methods. Moreover, the foundation model-based methods open a new window for HSI classification tasks, especially for HSI zero-shot classification.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?