Abstract:During the process of classifying Hyperspectral Image (HSI), every pixel sample is categorized under a land-cover type. CNN-based techniques for HSI classification have notably advanced the field by their adept feature representation capabilities. However, acquiring deep features remains a challenge for these CNN-based methods. In contrast, transformer models are adept at extracting high-level semantic features, offering a complementary strength. This paper's main contribution is the introduction of an HSI classification model that includes two convolutional blocks, a Gate-Shift-Fuse (GSF) block and a transformer block. This model leverages the strengths of CNNs in local feature extraction and transformers in long-range context modelling. The GSF block is designed to strengthen the extraction of local and global spatial-spectral features. An effective attention mechanism module is also proposed to enhance the extraction of information from HSI cubes. The proposed method is evaluated on four well-known datasets (the Indian Pines, Pavia University, WHU-WHU-Hi-LongKou and WHU-Hi-HanChuan), demonstrating that the proposed framework achieves superior results compared to other models.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to address the challenges in hyperspectral image (HSI) classification. Specifically, the author attempts to improve the performance of HSI classification by combining the advantages of convolutional neural network (CNN) and Transformer models. The following are the main problems that the paper attempts to solve: 1. **Challenges in deep feature extraction**: - CNN performs well in HSI classification, but still faces challenges in obtaining deep - level features. - The Transformer model is good at extracting high - level semantic features, but may not be able to fully utilize local spatial information when used alone. 2. **Fusion of local and global features**: - The proposed model needs to effectively fuse local spatial features and global context information to improve classification accuracy. 3. **Improvement of the attention mechanism**: - Existing methods do not fully extract local and global information when dealing with hyperspectral data. - It is necessary to design a more effective attention mechanism module to enhance the ability to extract information from the HSI cube. 4. **Dealing with high - dimensional data and noise**: - HSI data has high - dimensional characteristics and is easily affected by noise. - The model needs to have good robustness and be able to maintain high accuracy in a complex data environment. ### Main contributions of the paper To address the above challenges, the paper makes the following main contributions: - **Introduction of Gate - Shift - Fuse (GSF) blocks**: Specially designed for HSI classification, it can enhance the ability of CNN and Transformer in extracting local and global spatial - spectral features. - **Proposing an effective attention mechanism module**: Enabling the model to better extract local and global information from the HSI cube. - **Experimental verification**: Evaluated on four well - known datasets (Indian Pines, Pavia University, WHU - WHU - Hi - LongKou, and WHU - Hi - HanChuan), the results show that the proposed framework outperforms other models. Through these improvements, the paper aims to provide a more efficient and accurate method for HSI classification, especially having broad application prospects in fields such as agriculture, biomedical imaging, mineral exploration, food safety, and military reconnaissance.

Boosting Hyperspectral Image Classification with Gate-Shift-Fuse Mechanisms in a Novel CNN-Transformer Approach

A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification

Multiscale 3-D-2-D Mixed CNN and Lightweight Attention-Free Transformer for Hyperspectral and LiDAR Classification

Fast Hyperspectral Image Classification Combining Transformers and SimAM-Based CNNs

Bridging CNN and Transformer With Cross-Attention Fusion Network for Hyperspectral Image Classification

ELS2T: Efficient Lightweight Spectral-Spatial Transformer for Hyperspectral Image Classification

Hyperspectral Image Classification Based on Interactive Transformer and CNN With Multilevel Feature Fusion Network

CNN and Transformer interaction network for hyperspectral image classification

Double-branch feature fusion transformer for hyperspectral image classification

Hyperspectral Image Classification via Spectral Pooling and Hybrid Transformer

A Dual-Branch Multiscale Transformer Network for Hyperspectral Image Classification

A Center-Masked Transformer for Hyperspectral Image Classification

Adaptive Pixel-Level and Superpixel-Level Feature Fusion Transformer for Hyperspectral Image Classification

Adaptive Learnable Spectral–Spatial Fusion Transformer for Hyperspectral Image Classification

GTFN: GCN and Transformer Fusion Network With Spatial-Spectral Features for Hyperspectral Image Classification

MHIAIFormer: Multi-Head Interacted and Adaptive Integrated Transformer with Spatial-Spectral Attention for Hyperspectral Image Classification

Hyperspectral Image Transformer Classification Networks

Deep global-local transformer network combined with extended morphological profiles for hyperspectral image classification

Spectral-Swin Transformer with Spatial Feature Extraction Enhancement for Hyperspectral Image Classification

Spectral-Spatial Attention Transformer with Dense Connection for Hyperspectral Image Classification

MHIAIFormer: Multihead Interacted and Adaptive Integrated Transformer With Spatial-Spectral Attention for Hyperspectral Image Classification