Asymmetric Residual Transformer for Hyperspectral Image Classification Using Limited Training Samples

Yaxiu Sun,Minhui Wang,Jianhong Xiang,Lihong Hao,Rui Sun,Yaruo Wu
DOI: https://doi.org/10.1145/3638682.3638693
2024-01-01
Abstract:In recent years, the combination of convolutional neural network (CNN) and transformer has become a trend. CNN is good at capturing local information, while transformer is good at capturing global features of images. Hyperspectral images (HSI) have rich spatial feature information, and hundreds of continuous spectral bands contain a lot of deep sequence semantic information. Therefore, combining CNN and transformer can fully learn data feature information, which CNN can extract local spatial information, and transformer can extract band sequence information. In this paper, a new model ARFormer is proposed, which combines asymmetric residual connection (AR) block and improved vision transformer. First, the AR block contains the three-dimensional (3D) asymmetrical convolution and residual connection, which could extract abundant features information with fewer convolutional layers and parameters of model. Then the improved vision transformer introduces Re-attention, which could increase the diversity of the attention graph at different levels. Experiments in this paper are carried out on three famous hyperspectral datasets, including Indian Pines (IP), Trento (TR), and Pavia University (UP) datasets. The experimental results of HSI classification show that ARFormer achieves better classification accuracy than several existing models. Especially when the training samples are limited.
What problem does this paper attempt to address?