Abstract:Hyperspectral image classification (HSI) is the process of segmenting an image into distinct land cover types by analyzing the rich spectral information of each pixel, with the key lying in feature extraction. Benefiting from the superior ability to exploit long-range dependencies, transformer-based methods have garnered significant attention in the field. However, the limited local sensitivity, high computation burden, influence from heterogeneous spectrum random, and initialization of class token without prior knowledge may restrict the performance of transformer-based methods. To effectively address the aforementioned issues, this study introduces the Dual-Layer Spectral-Spatial Transformer architecture, adept at comprehensively extracting and modeling features. First, to address the issue of limited local sensitivity, we propose a dual-layer transformer architecture, where the inner Pixel-Transformer ensures adequate extraction of local features, and the outer Patch-Transformer is engineered to capture joint spectral-spatial features, thereby strengthening global context modeling. This dual-layer cascading approach not only provides balanced enhancement in feature extraction and modeling, but also alleviates the computational burden associated with self-attention operations. Meanwhile, we have also incorporated a feature selector to mitigate the influence of the heterogeneous spectrum. In addition, the inner Pixel-Transformer enhances feature representation by integrating the spectral vector of the target pixel as a class token, thereby solving the issue of random initialization of the class token without prior knowledge. Experimental results on four public HSI benchmark datasets demonstrate that our model outperforms state-of-the-art methods, with an improvement ranging from 0.86% to a maximum of 3.9%, and has achieved excellent classification results at the boundaries between different land cover types.

DGT: Deformable Graph Transformer for Hyperspectral Image Classification

Dilated Spectral–Spatial Gaussian Transformer Net for Hyperspectral Image Classification

GraphGST: Graph Generative Structure-Aware Transformer for Hyperspectral Image Classification

A graph-guided transformer based on dual-stream perception for hyperspectral image classification

A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification

Global–Local 3-D Convolutional Transformer Network for Hyperspectral Image Classification

RDTN: Residual Densely Transformer Network for hyperspectral image classification

Hypergraph Transformer for Semi-Supervised Classification

GTFN: GCN and Transformer Fusion Network With Spatial-Spectral Features for Hyperspectral Image Classification

Hyperspectral Image Classification Using Groupwise Separable Convolutional Vision Transformer Network

Joint Classification of Hyperspectral and LiDAR Data Based on Adaptive Gating Mechanism and Learnable Transformer

Deep global-local transformer network combined with extended morphological profiles for hyperspectral image classification

Two‐branch global spatial–spectral fusion transformer network for hyperspectral image classification

CS2DT: Cross Spatial–Spectral Dense Transformer for Hyperspectral Image Classification

DCTN: Dual-Branch Convolutional Transformer Network With Efficient Interactive Self-Attention for Hyperspectral Image Classification

Expeditious Hyperspectral Image Classification With Inner and Outer Layered Transformer Using Feature Enhancement

Cascaded Convolution-Based Transformer With Densely Connected Mechanism for Spectral–Spatial Hyperspectral Image Classification

Hyperspectral Image Classification Based on Multibranch Attention Transformer Networks

Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data Classification

A Dual-Branch Multiscale Transformer Network for Hyperspectral Image Classification