Distilling Segmenters from CNNs and Transformers for Remote Sensing Images' Semantic Segmentation.

Zhe Dong,Guoming Gao,Tianzhu Liu,Yanfeng Gu,Xiangrong Zhang
DOI: https://doi.org/10.1109/tgrs.2023.3290411
IF: 8.2
2023-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Semantic segmentation is a crucial task in remote sensing and has been predominantly performed using convolutional neural networks (CNNs) for the past decade. Recently, transformers with self-attention mechanisms have demonstrated superior performance compared with CNNs. However, due to the locality of CNN and the high computational complexity and massive data resource requirements of transformer, neither of them can be well applied in resource-constrained practical remote sensing scenarios. Motivated by the limitations of using either CNNs or transformers alone in the task of semantic segmentation of remote sensing images, a novel cross-model knowledge distillation (KD) framework, named distilling segmenters from CNNs and transformers (DSCTs), is proposed in this article to harness the complementary advantages of both the models. The framework uses a channel-weighted attention-guided feature distillation (CAFD) module to condense the feature from the teacher model and enhance the student model's focus on the teacher-focused regions. In addition, a target-nontarget KD (TNKD) module is proposed that decouples logit distillation into target and nontarget KD to guide the student model in learning the underlying representations and decision boundaries from the teacher model. By learning the complementary knowledge from the teacher, our proposed DSCT framework improves the student's segmentation performance without adding trainable parameters. Experiments on four available remote sensing datasets (ISPRS Potsdam, Vaihingen, GID, and LoveDA) indicate that the proposed DSCT outperforms the state-of-the-art KD methods and demonstrates its effectiveness and robustness.
What problem does this paper attempt to address?