Spatial–Spectral 1dswin Transformer with Groupwise Feature Tokenization for Hyperspectral Image Classification

Yifei Xu,Yixuan Xie,Bicheng Li,Chuanqi Xie,Yongchuan Zhang,Aichen Wang,Li Zhu
DOI: https://doi.org/10.1109/tgrs.2023.3294424
IF: 8.2
2023-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:The hyperspectral image (HSI) classification aims to assign each pixel to a land-cover category. It is receiving increasing attention from both industry and academia. The main challenge lies in capturing reliable and informative spatial and spectral dependencies concealed in the HSI for each class. To address the challenge, we propose a spatial–spectral 1DSwin (SS1DSwin) Transformer with groupwise feature tokenization for HSI classification. Specifically, we reveal local and hierarchical spatial–spectral relationships from two different perspectives. It mainly consists of a groupwise feature tokenization module (GFTM) and a 1DSwin Transformer with cross-block normalized connection module (TCNCM). For GFTM, we reorganize an image patch into overlapping cubes and further generate groupwise token embeddings with multihead self-attention (MSA) to learn the local spatial–spectral relationship along the spatial dimension. For TCNCM, we adopt the shifted windowing strategy when acquiring the hierarchical spatial–spectral relationship along the spectral dimension with 1-D window-based MSA (1DW-MSA) and 1-D shifted window-based MSA (1DSW-MSA) and leverage cross-block normalized connection (CNC) to adaptively fuse the feature maps from different blocks. In SS1DSwin, we apply these two modules in order and predict the class label for each pixel. To test the effectiveness of the proposed method, extensive experiments are conducted on four HSI datasets, and the results indicate that SS1DSwin outperforms several current state-of-the-art methods. The source code of the proposed method is available at https://github.com/Minato252/SS1DSwin .
What problem does this paper attempt to address?