Hybrid Vision Transformer Model for Hyperspectral Image Classification.

Jiaqi Yang,Bo Du,Chen Wu
DOI: https://doi.org/10.1109/igarss46834.2022.9884262
2022-01-01
Abstract:Due to the local connectivity property, convolutional neural network (CNN) can effectively extract contextual detailed information. Therefore, a large number of CNN-based methods are introduced to hyperspectral image (HSI) classification. However, receptive fields of these methods are greatly limited, and information extraction process is usually inadequate. Recently, transformer structure has attracted extensive attention owing to its ability to capture global dependency. With a self-attention mechanism, transformer can extract long-tail distribution and model global features to enhance the representation of data. Consequently, it is a natural idea to combine CNN and transformer to obtain both local detail and global distribution. In this paper, we propose a hybrid vision transformer model (Hybrid ViT) to jointly learn global and local information of HSI, including a convolution block and a vision transformer block. With the unified architecture, Hybrid ViT model can not only access detailed features of narrow targets but also extract the global distribution of large objects. Experimental results on benchmark HSI datasets demonstrate that the proposed Hybrid ViT can outperform other methods with higher classification accuracy and finer classification maps.
What problem does this paper attempt to address?