S2HM2: A Spectral–Spatial Hierarchical Masked Modeling Framework for Self-Supervised Feature Learning and Classification of Large-Scale Hyperspectral Images

Lilin Tu,Xing Xie,Jiayi Li,Xin Huang,Leiguang Wang,Jianya Gong
DOI: https://doi.org/10.1109/TGRS.2024.3392962
IF: 8.2
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Most of the existing deep learning-based hyperspectral image (HSI) classification algorithms are based on supervised learning, where a large number of annotated labels with high acquisition cost are required. Self-supervised learning (SSL) methods can learn abundant representations using a large amount of unlabeled data, thereby reducing the reliability of labels. In particular, SSL based on masked image modeling (MIM) can extract fine-grained features, which is well suited for HSI classification as a pixel-level interpretation task. However, MIM has scarcely been investigated in the HSI classification. Current algorithms lack a comprehensive consideration of the multiscale spectral–spatial characteristics of HSI when constructing the pretraining task, and there exists high computational cost and redundancy when applied to large-scale HSIs. Therefore, this article develops an SSL framework based on spectral–spatial hierarchical masked modeling (S2HM2) for large-scale HSI classification. Considering the spectral–spatial characteristics of HSI, 3-D masking strategy and spectral–spatial consistency loss are proposed to construct the MIM task. To fully exploit features at each scale, hierarchical 3-D feature pyramid network (3D-FPN) is designed as decoder for both pretext and downstream tasks in a “pixel-to-pixel” manner. In addition, multiscale masked feature modeling (MS-MFM) task is proposed to further facilitate the multiscale feature learning. The SSL pretraining is guided by both MIM and MS-MFM. The experimental results on two large-scale hyperspectral datasets, i.e., WHU-OHS and WHU-H2SR, demonstrate the superiority of the proposed method. Furthermore, transfer learning experiments are conducted on a variety of hyperspectral datasets, where classification accuracies are boosted in most of the scenarios. The source code will be made available at https://github.com/tulilin/S2HM2.
Environmental Science,Engineering,Computer Science
What problem does this paper attempt to address?