Vision Transformer With Contrastive Learning for Hyperspectral Image Classification

Heng Zhou,Xin Zhang,Chunlei Zhang,Qiaoyu Ma
DOI: https://doi.org/10.1109/LGRS.2023.3255867
IF: 5.343
2023-01-01
IEEE Geoscience and Remote Sensing Letters
Abstract:The vision transformer (ViT) has become a hot topic in image processing due to its global feature extraction capabilities. However, the ViT suffers from over-smoothing in feature extraction and over-fitting in the training procedure, so it is hard to achieve satisfactory performance in hyperspectral image (HSI) classification. To address these issues, we propose a ViT with contrastive learning (CViT). The network architecture includes a patch embedding module, transformer blocks, and a classifier. The training of CViT can be considered as an optimization problem with a supervised contrastive loss, an unsupervised contrastive loss, and an $\ell _{1}$ -regularizer with respect to linear self-attention weights. Specifically, the supervised contrastive loss is proposed to alleviate the negative effects of HSI features’ spectral variability and spatial diversity by increasing intraclass consistency. On the other hand, the unsupervised contrastive loss is exploited to reduce redundancy by reconstructing global structural information. In particular, regularized linear self-attention weights reduce the over-smoothing issue. Extensive experimental results on three HSI datasets demonstrate that the proposed CViT achieves competitive performance.
Computer Science
What problem does this paper attempt to address?