Hyperspectral Image Classification Using Groupwise Separable Convolutional Vision Transformer Network

Zhuoyi Zhao,Xiang Xu,Shutao Li,Antonio Plaza
DOI: https://doi.org/10.1109/tgrs.2024.3377610
IF: 8.2
2024-03-30
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Recently, vision transformer (ViT)-based deep learning (DL) models have achieved remarkable performance gains in hyperspectral image classification (HSIC) due to their abilities to model long-range dependencies and extract global spatial features. However, ViT is built with a stack of Transformer blocks and faces the challenge of learning a large number of parameters when processing hyperspectral data. Besides, the inherent modeling of global correlation in Transformer ignores the effective representation of local spatial and spectral features. To address these issues, we propose a lightweight ViT network known as groupwise separable convolutional ViT (GSC-ViT). First, a groupwise separable convolution (GSC) module, which is a combination of grouped pointwise convolution (GPWC) and group convolution, is designed to significantly decrease the number of convolutional kernel parameters, and effectively capture local spectral–spatial information in hyperspectral image (HSI). Second, a groupwise separable multihead self-attention (GSSA) module is employed to substitute the conventional multihead self-attention (MSA) in ViT, in which the groupwise self-attention (GSA) provides local spatial feature extraction, and the pointwise self-attention (PWSA) provides global spatial feature extraction. Third, a simple pointwise layer with enhanced skip connection mechanism is employed to substitute the multilayer perceptron (MLP) layer in all Transformer blocks of ViT, so as to eliminate unnecessary nonlinear transformations and facilitate the fusion of features derived from GSC and GSSA modules. Extensive experiments on four benchmark hyperspectral datasets reveal that our GSC-ViT can achieve surprising classification performance with relatively few training samples as compared with some existing HSIC approaches. The source code is available at https://github.com/flyzzie/TGRS-GSC-VIT.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?