Abstract:Transformer models are increasingly used in hyperspectral image (HSI) classification, thanks to their excellent global feature extraction capabilities. However, these networks still need to be improved in recognizing locally complex feature shapes at different scales and handling linear and nonlinear complex correlations between spectral channels. To this end, we propose an innovative multiscale spatial-spectral information interaction transformer (MS2I2Former) architecture. The architecture skillfully integrates lightweight convolution and Transformer, effectively integrates local and global multiscale spatial features and spectral information, and realizes effective interaction between different scales. We design a multiscale spatial-spectral information interaction (MS2I2) module, which efficiently captures multiscale spatial-spectral features by combining deep convolution of convolution kernels of different sizes and orientations with the frequency domain. Based on this, we propose a distance mean cross-covariance representation (DMC2R) based on distance covariance, which aims to deeply explore the linear and nonlinear relationships between different spectral channels. Considering the convolutional kernel parameters and the comprehensive extraction of joint spectral-space features, we developed the hybrid convolution (HC) module, which combines multiple lightweight convolutions to extract deeper spectral-space features. To model complex remote feature relationships, we innovatively propose the multiscale double cross-symmetric transformer (MDCST) module. This module feeds the rich feature representations after multiscale mapping into double cross-symmetric attention (DCSA), which enhances the internal interactions and fusions among features to capture a wider range of feature dependencies. Experimental results show that on four public datasets, MS2I2Former achieves excellent classification results with fewer training samples compared to existing methods. The source code link is available at https://github.com/cslxju/MS2I2Former.

CSiT: A Multiscale Vision Transformer for Hyperspectral Image Classification.

A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification

Joint Multi-Scale CNN and Vision Transformer for Hyperspectral Image Classification

Hybrid Multi-Scale Spatial-Spectral Transformer for Hyperspectral Image Classification

Hyperspectral Image Classification Using Hierarchical Spatial-Spectral Transformer

A Spatial–Spectral Transformer for Hyperspectral Image Classification Based on Global Dependencies of Multi-Scale Features

CS2DT: Cross Spatial–Spectral Dense Transformer for Hyperspectral Image Classification

Cross-Domain Hyperspectral Image Classification Based on Transformer

Multi-granularity Vision Transformer Via Semantic Token for Hyperspectral Image Classification

Hyperspectral Image Classification Using Group-Aware Hierarchical Transformer

Multi-Scale Super Token Transformer for Hyperspectral Image Classification

A Dual-Branch Multiscale Transformer Network for Hyperspectral Image Classification

Multiattention Joint Convolution Feature Representation with Lightweight Transformer for Hyperspectral Image Classification.

MS2I2Former: Multiscale Spatial-Spectral Information Interactive Transformer for Hyperspectral Image Classification

Hyperspectral Image Classification Using Spectral–Spatial Token Enhanced Transformer with Hash-Based Positional Embedding

Multi-scale Spectral-Spatial Dual-Transformer Network for Hyperspectral Image Classification

3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification

Investigation of Hierarchical Spectral Vision Transformer Architecture for Classification of Hyperspectral Imagery

End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification

Deep Spectral Spatial Feature Enhancement Through Transformer for Hyperspectral Image Classification