Joint Multi-Scale CNN and Vision Transformer for Hyperspectral Image Classification

Rui Sun,Jianhong Xiang,Linyu Wang
DOI: https://doi.org/10.1109/iccect60629.2024.10546120
2024-01-01
Abstract:Convolutional neural network (CNN) has shown great performance in the study of hyperspectral image (HSI) classification. However, HSI contains hundreds of continuous spectral bands, and the CNN-based HSI classification methods neglect the deeper sequence semantic information in the spectrum. The spectral sequence information of HSI can be better processed by the Transformer. In this article, a new network MCSS-ViT is designed to cater to vital spectral-spatial features extraction of HSI at different levels. The MCSS-ViT is composed of the multi-scale CNN based on residual and channel attention module (MCRC) with the vision transformer (ViT), where the MCRC involves both the CRC Block and the AIC Block. First, principal component analysis (PCA) is employed to reduce the spectral dimension of the HSI. Subsequently, the CRC block is constructed to fully learn the spectral-spatial information of patches. Meanwhile, in order to avoid losing important multi-scale spatial information, the AIC block is developed for capturing complementary spatial features. Finally, the SS-Former is introduced to satisfy the extraction of global and semantic features from the image. The performance of MCSS-ViT was evaluated on two datasets. The conducted experiments showed that the proposed method had better classification effect than other classical methods.
What problem does this paper attempt to address?