Dual Branch Masked Transformer for Hyperspectral Image Classification

Kuo Li,Yushi Chen,Lingbo Huang
DOI: https://doi.org/10.1109/lgrs.2024.3490534
IF: 5.343
2024-11-20
IEEE Geoscience and Remote Sensing Letters
Abstract:Transformer has been widely used in hyperspectral image (HSI) classification tasks because of its ability to capture long-range dependencies. However, most Transformer-based classification methods lack the extraction of local information or do not combine spatial and spectral information well, resulting in insufficient extraction of features. To address these issues, in this study, a dual-branch masked Transformer (Dual-MTr) model is proposed. Masked Transformer (MTr) is used to pretrain vision transformer (ViT) by reconstruction of both masked spatial image and spectral spectrum, which embeds the local bias by the process of recovering from localized patches to the global original input. Different tokenization methods are used for different types of input data. Patch embedding with overlapping regions is used for 2-D spatial data and group embedding is used for 1-D spectral data. Supervised learning has been added to the pretraining process to enhance strong discriminability. Then, the dual-branch structure is proposed to combine the spatial and spectral features. To strengthen the connection between the two branches better, Kullback-Leibler (KL) divergence is used to measure the differences between the classification results of the two branches, and the loss resulting from the computed differences is incorporated into the training process. Experimental results from two hyperspectral datasets demonstrate the effectiveness of the proposed method compared to other methods.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?