Abstract:Hyperspectral image classification (HSI) is the process of segmenting an image into distinct land cover types by analyzing the rich spectral information of each pixel, with the key lying in feature extraction. Benefiting from the superior ability to exploit long-range dependencies, transformer-based methods have garnered significant attention in the field. However, the limited local sensitivity, high computation burden, influence from heterogeneous spectrum random, and initialization of class token without prior knowledge may restrict the performance of transformer-based methods. To effectively address the aforementioned issues, this study introduces the Dual-Layer Spectral-Spatial Transformer architecture, adept at comprehensively extracting and modeling features. First, to address the issue of limited local sensitivity, we propose a dual-layer transformer architecture, where the inner Pixel-Transformer ensures adequate extraction of local features, and the outer Patch-Transformer is engineered to capture joint spectral-spatial features, thereby strengthening global context modeling. This dual-layer cascading approach not only provides balanced enhancement in feature extraction and modeling, but also alleviates the computational burden associated with self-attention operations. Meanwhile, we have also incorporated a feature selector to mitigate the influence of the heterogeneous spectrum. In addition, the inner Pixel-Transformer enhances feature representation by integrating the spectral vector of the target pixel as a class token, thereby solving the issue of random initialization of the class token without prior knowledge. Experimental results on four public HSI benchmark datasets demonstrate that our model outperforms state-of-the-art methods, with an improvement ranging from 0.86% to a maximum of 3.9%, and has achieved excellent classification results at the boundaries between different land cover types.

LGFormer: Local-to-Global Transformer for Hyperspectral Image Classification

A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification

Global–Local 3-D Convolutional Transformer Network for Hyperspectral Image Classification

Hyper-LGNet: Coupling Local and Global Features for Hyperspectral Image Classification

ELS2T: Efficient Lightweight Spectral-Spatial Transformer for Hyperspectral Image Classification

Local-global feature fusion network for hyperspectral image classification

Global and Local Attention-Based Transformer for Hyperspectral Image Change Detection

Deep global-local transformer network combined with extended morphological profiles for hyperspectral image classification

Two‐branch global spatial–spectral fusion transformer network for hyperspectral image classification

Spectral–Spatial Feature Extraction for Hyperspectral Image Classification Using Enhanced Transformer with Large-Kernel Attention

Expeditious Hyperspectral Image Classification With Inner and Outer Layered Transformer Using Feature Enhancement

Hyperspectral Image Classification Using Groupwise Separable Convolutional Vision Transformer Network

Hierarchical Attention Transformer for Hyperspectral Image Classification

A Lightweight Transformer Network for Hyperspectral Image Classification

SWFormer: Stochastic Windows Convolutional Transformer for Hybrid Modality Hyperspectral Classification

MHIAIFormer: Multihead Interacted and Adaptive Integrated Transformer With Spatial-Spectral Attention for Hyperspectral Image Classification

MHIAIFormer: Multi-Head Interacted and Adaptive Integrated Transformer with Spatial-Spectral Attention for Hyperspectral Image Classification

When Multigranularity Meets Spatial–Spectral Attention: A Hybrid Transformer for Hyperspectral Image Classification

A U-Shaped Convolution-Aided Transformer with Double Attention for Hyperspectral Image Classification

Selective Transformer for Hyperspectral Image Classification