Abstract:Hyperspectral images (HSIs) contain spatially structured information and pixel−level sequential spectral attributes. The continuous spectral features contain hundreds of wavelength bands and the differences between spectra are essential for achieving fine−grained classification. Due to the limited receptive field of backbone networks, convolutional neural networks (CNNs)−based HSI classification methods show limitations in modeling spectral−wise long−range dependencies with fixed kernel size and a limited number of layers. Recently, the self−attention mechanism of transformer framework is introduced to compensate for the limitations of CNNs and to mine the long−term dependencies of spectral signatures. Therefore, many joint CNN and Transformer architectures for HSI classification have been proposed to obtain the merits of both networks. However, these architectures make it difficult to capture spatial–spectral correlation and CNNs distort the continuous nature of the spectral signature because of the over−focus on spatial information, which means that the transformer can easily encounter bottlenecks in modeling spectral−wise similarity and long−range dependencies. To address this problem, we propose a neighborhood enhancement hybrid transformer (NEHT) network. In particular, a simple 2D convolution module is adopted to achieve dimensionality reduction while minimizing the distortion of the original spectral distribution by stacked CNNs. Then, we extract group−wise spatial–spectral features in a parallel design to enhance the representation capability of each token. Furthermore, a feature fusion strategy is introduced to increase subtle discrepancies of spectra. Finally, the self−attention of transformer is employed to mine the long−term dependencies between the enhanced feature sequences. Extensive experiments are performed on three well−known datasets and the proposed NEHT network shows superiority over state−of−the−art (SOTA) methods. Specifically, our proposed method outperforms the SOTA method by 0.46%, 1.05% and 0.75% on average in overall accuracy, average accuracy and kappa coefficient metrics.

SPFormer: Self-Pooling Transformer for Few-Shot Hyperspectral Image Classification

A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification

A Lightweight Transformer Network for Hyperspectral Image Classification

Hyperspectral Image Classification via Spectral Pooling and Hybrid Transformer

Spectral-Swin Transformer with Spatial Feature Extraction Enhancement for Hyperspectral Image Classification

Expeditious Hyperspectral Image Classification With Inner and Outer Layered Transformer Using Feature Enhancement

ELS2T: Efficient Lightweight Spectral-Spatial Transformer for Hyperspectral Image Classification

MHIAIFormer: Multi-Head Interacted and Adaptive Integrated Transformer with Spatial-Spectral Attention for Hyperspectral Image Classification

MHIAIFormer: Multihead Interacted and Adaptive Integrated Transformer With Spatial-Spectral Attention for Hyperspectral Image Classification

Selective Transformer for Hyperspectral Image Classification

Hyperspectral Image Classification Using Spectral–Spatial Token Enhanced Transformer with Hash-Based Positional Embedding

MHCFormer: Multiscale Hierarchical Conv-Aided Fourierformer for Hyperspectral Image Classification

Cascaded Convolution-Based Transformer With Densely Connected Mechanism for Spectral–Spatial Hyperspectral Image Classification

A Lightweight 1-D Convolution Augmented Transformer with Metric Learning for Hyperspectral Image Classification

Fast Hyperspectral Image Classification Combining Transformers and SimAM-Based CNNs

Spectral-Spatial Attention Transformer with Dense Connection for Hyperspectral Image Classification

MultiScale Spectral-Spatial Convolutional Transformer for Hyperspectral Image Classification

Multilevel Class Token Transformer With Cross TokenMixer for Hyperspectral Images Classification

Hyper-ES2T: Efficient Spatial–Spectral Transformer for the classification of hyperspectral remote sensing images

MHST: Multiscale Head Selection Transformer for Hyperspectral and LiDAR Classification

Hyperspectral Image Classification Based on Multi-Level Spectral-Spatial Transformer Network