A Center-Masked Transformer for Hyperspectral Image Classification
Sen Jia,Yifan Wang,Shuguo Jiang,Ruyan He
DOI: https://doi.org/10.1109/tgrs.2024.3369075
IF: 8.2
2024-03-12
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Convolutional neural networks (CNNs) are widely used in hyperspectral image (HSI) classification. However, the fixed receptive field of CNN-based methods limits their capability to extract global features. In recent years, transformer has been introduced into networks to tackle this limitation, but it brings other challenges, including a significant increase in model size, the number of labeled training samples required, and the limited effectiveness of sample encoding-reconstruction pretraining methods for HSI classification. To address these issues, a center-masked transformer (CMT) approach is proposed to improve the HSI classification accuracy from two perspectives. On one hand, a local-to-global token embedding (L2GTE) framework coupled with a multiscale convolutional token embedding (MCTE) module is used, which is well-designed to obtain local and global embedding tokens. This effectively reduces the number of model parameters. On the other hand, a regularized center-masked pretraining (RCPT) task is proposed and first introduced into the transformer-based network, which enables the network to learn the dependencies between central ground objects and neighboring objects without labels during the pretraining process. The experimental results conducted on five public HSI datasets demonstrate that our CMT approach outperforms other state-of-the-art methods for HSI classification when training samples are insufficient.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics