Hierarchical Attention Transformer for Hyperspectral Image Classification

Tahir Arshad,Junping Zhang
DOI: https://doi.org/10.1109/lgrs.2024.3379509
IF: 5.343
2024-04-05
IEEE Geoscience and Remote Sensing Letters
Abstract:Hyperspectral image (HSI) data contain rich spectral–spatial information, which can be useful for various applications. Many methods have been proposed to classify the HSIs. Nonetheless, the availability of limited training samples in traditional models frequently weakens their ability to handle the inherent complexity of the task. Deep learning models have been successfully applied in the field of remote sensing. In this letter, we propose a vision transformer (ViT)-based network called hierarchical attention transformer that combines the properties of local representation learning in 3-D and 2-D convolutional neural networks (CNNs) and potent global modeling capabilities in ViT. We leverage the efficiency of window-based self-attention. Within each window, there are dedicated tokens that contribute to both local and global representation learning. The overall accuracy (OA) of the proposed model achieved 99.70%, 99.89%, 99.56%, 81.75%, and 99.59% on five datasets.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?