Abstract:Although convolutional neural networks (CNNs) have proven successful for hyperspectral image classification (HSIC), it is difficult to characterize the global dependencies between HSI pixels at long-distance ranges and spectral bands due to their limited receptive domain. The transformer can compensate well for this shortcoming, but it suffers from a lack of image-specific inductive biases (i.e., localization and translation equivariance) and contextual position information compared with CNNs. To overcome the aforementioned challenges, we introduce a simply structured, end-to-end convolutional network and spectral–spatial transformer (CNSST) architecture for HSIC. Our CNSST architecture consists of two essential components: a simple 3D-CNN-based hierarchical feature fusion network and a spectral–spatial transformer that introduces inductive bias information. The former employs a 3D-CNN-based hierarchical feature fusion structure to establish the correlation between spectral and spatial (SAS) information while capturing richer inductive bias and more discriminative local spectral-spatial hierarchical feature information, while the latter aims to establish the global dependency among HSI pixels while enhancing the acquisition of local information by introducing inductive bias information. Specifically, the spectral and inductive bias information is incorporated into the transformer's multi-head self-attention mechanism (MHSA), thus making the attention spectrally aware and location-aware. Furthermore, a Lion optimizer is exploited to boost the classification performance of our newly developed CNSST. Substantial experiments conducted on three publicly accessible hyperspectral datasets unequivocally showcase that our proposed CNSST outperforms other state-of-the-art approaches.

Semi-supervised Co-training Model Using Convolution and Transformer for Hyperspectral Image Classifica Tion

Semi-Supervised Co-Training Model Using Convolution and Transformer for Hyperspectral Image Classification

Hyperspectral Image Classification Using Hierarchical Spatial-Spectral Transformer

Multiscale and Cross-Level Attention Learning for Hyperspectral Image Classification

Hybrid Spectral-Spatial Convolutional Network and Transformer with Mixup Regularization for Hyperspectral Image Classification

MSMT-LCL: Multiscale Spatial-Spectral Masked Transformer With Local Contrastive Learning for Hyperspectral Image Classification

CTF-SSCL: CNN-Transformer for Few-Shot Hyperspectral Image Classification Assisted by Semisupervised Contrastive Learning

Multi-scale Spectral-Spatial Dual-Transformer Network for Hyperspectral Image Classification

CNN and Transformer Hybrid Network for Hyperspectral Image Classification

Semi-Active Convolutional Neural Networks for Hyperspectral Image Classification

Hyperspectral Remote-Sensing Classification Combining Transformer and Multiscale Residual Mechanisms

Grouped Bidirectional LSTM Network and Multi-Stage Fusion Convolutional Transformer for Hyperspectral Image Classification

Hyperspectral Image Classification based on Multi-Scale Convolutional Features and Multi-Attention Mechanisms

Global–Local 3-D Convolutional Transformer Network for Hyperspectral Image Classification

Main–Sub Transformer with Spectral–Spatial Separable Convolution for Hyperspectral Image Classification

ELS2T: Efficient Lightweight Spectral-Spatial Transformer for Hyperspectral Image Classification

A Global+ Multiscale Hybrid Network for Hyperspectral Image Classification

MSCC-ViT:A Multiscale Visual-Transformer Network Using Convolution Crossing Attention for Hyperspectral Unmixing

End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification

CTACL:Hyperspectral Image Change Detection Based on Adaptive Contrastive Learning