SPFormer: Self-Pooling Transformer for Few-Shot Hyperspectral Image Classification

Ziyu Li,Zhaohui Xue,Qi Xu,Ling Zhang,Tianzhi Zhu,Mengxue Zhang
DOI: https://doi.org/10.1109/tgrs.2023.3345923
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Transformer has shown great potential in extracting global features, and it can achieve better classification performance with a large number of training samples compared with other deep learning (DL) models. However, most of the existing Transformer-based models for hyperspectral image (HSI) classification simply use the multihead self-attention and channel multilayer perceptron (MLP) modules that contain many parameters to learn, resulting in poor performance in a few-shot learning scenario. To overcome the above issue, a lightweight self-pooling Transformer (SPFormer) is proposed for few-shot HSI classification. First, a one-layer autoencoder based on self-supervised learning is built to reduce the dimensionality of HSI. Second, two parameter-free modules, channel shuffle for multihead self-pooling with sparse mapping (CSSM-MHSP) and central token mixer (CTM), are proposed for mapping spectral features to higher dimensions and promoting information interaction between pixels, respectively. Third, a lightweight channel embedding is designed to extract deep spectral features. Finally, a fully connected layer is used for classification. The classification performance of the proposed method is evaluated on four benchmark datasets, showing its superiority in classification accuracy, generalization performance, and model complexity compared to existing state-of-the-art methods with limited training samples.
What problem does this paper attempt to address?