Hyperspectral Image Classification Using Spectral–Spatial Token Enhanced Transformer with Hash-Based Positional Embedding

Ke Wu,Jiayuan Fan,Peng Ye,Mingzhen Zhu
DOI: https://doi.org/10.1109/tgrs.2023.3258488
IF: 8.2
2023-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Hyperspectral image (HSI) classification aims to distinguish the category of a land coverage object for each pixel. In an effective way, the transformer architecture has been successfully introduced for the HSI classification task with promising performance. However, existing transformer-based HSI classification methods still suffer from the inability to fully explore both spectral information and spatial information in HSIs. To this end, we propose a spectral–spatial token enhanced transformer (SSTE-Former) method with the hash-based positional embedding, which is the first to exploit multiscale spectral–spatial information for transformer-based HSI classification in-depth. Specifically, SSTE-Former accepts multiscale HSI cubes centered on the target pixel, which are preprocessed by the principal component analysis (PCA). Then, a designed multiscale convolutional neural network (CNN) architecture is utilized to extract short-range spectral–spatial features and generate token embeddings. In parallel, a novel hash-based spatially enhanced positional embedding tailored for HSI cubes is developed to model the correlations within and across multiscale token embeddings. Finally, multiscale token embeddings and hash-based positional embeddings are concatenated and flattened into the transformer encoder for long-range spectral–spatial feature fusion. We conduct extensive experiments on four benchmark HSI datasets and achieve superior performance compared with the state-of-the-art HSI classification methods.
What problem does this paper attempt to address?