Multiform Ensemble Self-Supervised Learning for Few-Shot Remote Sensing Scene Classification

Jianzhao Li,Maoguo Gong,Huilin Liu,Yourun Zhang,Mingyang Zhang,Yue Wu
DOI: https://doi.org/10.1109/tgrs.2023.3234252
IF: 8.2
2023-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Self-supervised learning is an effective way to solve model collapse for few-shot remote sensing scene classification (FSRSSC). However, most self-supervised contrastive learning auxiliary tasks perform poorly on the high interclass similarity problem in FSRSSC. Furthermore, it is time-consuming and computationally expensive to obtain the best combination among numerous self-supervised auxiliary tasks. In practical applications, we may encounter difficulties in remote sensing data acquisition and labeling, while most FSRSSC studies only focus on the former. To alleviate the above problems, we propose a multiform ensemble self-supervised learning (MES2L) framework for FSRSSC in this article. Based on the transfer learning-based few-shot scheme, we design a novel global–local contrastive learning auxiliary task to solve the low interclass separability problem. The self-attention mechanism is designed in the local contrast features to investigate the intrinsic associations between different remote sensing scene objectives. We also present a multiform ensemble enhancement (MEE) training method. Ensemble enhancement involves the concatenation of features extracted from different backbones trained by a combination of multiform self-supervised auxiliary tasks. MEE can not only be regarded as a more straightforward alternative to knowledge distillation but also can achieve an effective compromise between expensive computational cost and classification accuracy. In addition, we provide two scene classification schemes of inductive and transductive settings, corresponding to solving the difficulties of remote sensing data acquisition and labeling. The proposed network achieves state-of-the-art results on three benchmark FSRSSC datasets. The potential of the MES2L framework is also demonstrated in combination with classical metalearning-based and metric learning-based few-shot algorithms.
What problem does this paper attempt to address?