Abstract:In recent years, convolutional neural networks have continuously dominated the downstream tasks on hyperspectral remote sensing images with its strong local feature extraction capability. However, convolution operations cannot effectively capture the long-range dependencies and repeatedly stacking convolutional layers to pursue a hierarchical structure can only make this problem alleviated but not completely solved. Meantime, the appearance of Transformer happens to cope with this problem and provides an opportunity to capture long-distance dependencies between tokens. Although Transformer has been introduced into HSI classification field recently, most of these related works only focus on exploiting a single kind of spatial or spectral information and neglect to explore the optimal fusion method for these two different-level features. Therefore, to fully exploit the abundant spatial information and spectral correlations in HSIs in a highly effective and efficient way, we present the initial attempt to explore the Transformer architecture in a dual-branch manner and propose a novel bilateral classification network named Hyper-ES2T. Besides, the Aggregated Feature Enhancement Module is proposed for effective feature aggregation and further spatial–spectral feature enhancement. Furthermore, to tackle the problem of high computational costs brought by vanilla self-attention block in Transformer, we design the Efficient Multi-Head Self-Attention block, pursuing the trade-off between model accuracy and efficiency. The proposed Hyper-ES2T reaches new state-of-the-art performance and outperforms previous methods by a significant margin on four benchmark datasets for HSI classification, which demonstrates the powerful generalization ability and superior feature representation capability of our Hyper-ES2T. It can be anticipated that this work provides a novel insight to design network architecture based on Transformer with superior performance and great model efficiency, which may inspire more following research in this direction of HSI processing field. The source codes will be available at https://github.com/Wenxuan-1119/Hyper-ES2T.

MS2I2Former: Multiscale Spatial-Spectral Information Interactive Transformer for Hyperspectral Image Classification

Multiscale 3-D-2-D Mixed CNN and Lightweight Attention-Free Transformer for Hyperspectral and LiDAR Classification

A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification

CS2DT: Cross Spatial–Spectral Dense Transformer for Hyperspectral Image Classification

MHIAIFormer: Multi-Head Interacted and Adaptive Integrated Transformer with Spatial-Spectral Attention for Hyperspectral Image Classification

A Dual-Branch Multiscale Transformer Network for Hyperspectral Image Classification

MHIAIFormer: Multihead Interacted and Adaptive Integrated Transformer With Spatial-Spectral Attention for Hyperspectral Image Classification

A Spatial–Spectral Transformer for Hyperspectral Image Classification Based on Global Dependencies of Multi-Scale Features

MultiScale Spectral-Spatial Convolutional Transformer for Hyperspectral Image Classification

MHCFormer: Multiscale Hierarchical Conv-Aided Fourierformer for Hyperspectral Image Classification

ELS2T: Efficient Lightweight Spectral-Spatial Transformer for Hyperspectral Image Classification

MultiScale spectral–spatial convolutional transformer for hyperspectral image classification

MASSFormer: Memory-Augmented Spectral-Spatial Transformer for Hyperspectral Image Classification

Expeditious Hyperspectral Image Classification With Inner and Outer Layered Transformer Using Feature Enhancement

Hyper-ES2T: Efficient Spatial–Spectral Transformer for the classification of hyperspectral remote sensing images

Multilevel Class Token Transformer With Cross TokenMixer for Hyperspectral Images Classification

MSCC-ViT:A Multiscale Visual-Transformer Network Using Convolution Crossing Attention for Hyperspectral Unmixing

MSMT-LCL: Multiscale Spatial-Spectral Masked Transformer With Local Contrastive Learning for Hyperspectral Image Classification

Reciprocal transformer for hyperspectral and multispectral image fusion

Two‐branch global spatial–spectral fusion transformer network for hyperspectral image classification

DCTN: Dual-Branch Convolutional Transformer Network With Efficient Interactive Self-Attention for Hyperspectral Image Classification