Abstract:Convolutional neural networks (CNNs) have dominated the hyperspectral image (HSI) classification due to their tremendous feature learning capability. However, the formidable local sensitivity is both a strength and a weakness. Recently, the vision transformers have exhibited impressive performances on various vision problems. Compared with CNNs, they can model long-range dependencies to learn more abundant interactions between spatial locations. Nevertheless, the existing transformer-based HSI classification methods also concentrate too much on the advantages of the transformer architecture and disregard the importance of local dependencies. In addition, token generation and token mixers in transformer-like architectures have not been adequately explored, leading to difficulties in obtaining the best classification performance. To deal with these problems, a novel multiscale hierarchical conv-aided Fourierformer (MHCFormer) is proposed for HSI classification. To the best of our knowledge, this is the first time that CNN, transformer, and Fourier transform are skillfully combined for HSI classification. The proposed MHCFormer involves three stages, i.e., multiscale spectral–spatial token generation, hierarchical token learning, and a classification head. The multiscale spectral–spatial token generation is constructed to transform HSI into tokens with multiscale-enhanced spectral–spatial information. The hierarchical token learning is designed to explore multiscale tokens globally and locally by integrating the design philosophy of transformers and CNNs along with Fourier transforms into a block and stacking the blocks hierarchically. Extensive experimental results on the new WHU-Hi-HanChuan dataset and the widely used Indian Pines and Houston 2013 datasets have demonstrated the superiority of MHCFormer over other state-of-the-art methods. The code of our work will be available publicly at https://github.com/Tikiten/MHCFormer.

Joint Multi-Scale CNN and Vision Transformer for Hyperspectral Image Classification

A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification

CSiT: A Multiscale Vision Transformer for Hyperspectral Image Classification.

Multi-granularity Vision Transformer Via Semantic Token for Hyperspectral Image Classification

Hyperspectral Image Classification Using Hierarchical Spatial-Spectral Transformer

MSCC-ViT:A Multiscale Visual-Transformer Network Using Convolution Crossing Attention for Hyperspectral Unmixing

Hybrid Vision Transformer Model for Hyperspectral Image Classification.

Multiattention Joint Convolution Feature Representation with Lightweight Transformer for Hyperspectral Image Classification.

Multiple Vision Architectures-Based Hybrid Network for Hyperspectral Image Classification

Multi-scale Spectral-Spatial Dual-Transformer Network for Hyperspectral Image Classification

Convolution Transformer Fusion Splicing Network for Hyperspectral Image Classification

A Dual-Branch Multiscale Transformer Network for Hyperspectral Image Classification

Hybrid Conv-ViT Network for Hyperspectral Image Classification

Hyperspectral Image Classification based on Multi-Scale Convolutional Features and Multi-Attention Mechanisms

Hybrid Multi-Scale Spatial-Spectral Transformer for Hyperspectral Image Classification

Multi-Scale Super Token Transformer for Hyperspectral Image Classification

A Joint Convolutional Cross ViT Network for Hyperspectral and Light Detection and Ranging Fusion Classification

MHCFormer: Multiscale Hierarchical Conv-Aided Fourierformer for Hyperspectral Image Classification

End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification

CNN and Transformer Hybrid Network for Hyperspectral Image Classification

Tripartite‐structure transformer for hyperspectral image classification