Abstract:Recently, class incremental semantic segmentation (CISS) towards the practical open-world setting has attracted increasing research interest, which is mainly challenged by the well-known issue of catastrophic forgetting. Particularly, knowledge distillation (KD) techniques have been widely studied to alleviate catastrophic forgetting. Despite the promising performance, existing KD-based methods generally use the same distillation schemes for different intermediate layers to transfer old knowledge, while employing manually tuned and fixed trade-off weights to control the effect of KD. These KD-based methods take no consideration of feature characteristics from different intermediate layers, limiting the effectiveness of KD for CISS. In this paper, we propose a layer-specific knowledge distillation (LSKD) method to assign appropriate knowledge schemes and weights for various intermediate layers by considering feature characteristics, aiming to further explore the potential of KD in improving the performance of CISS. Specifically, we present a mask-guided distillation (MD) to alleviate the background shift on semantic features, which performs distillation by masking the features affected by the background. Furthermore, a mask-guided context distillation (MCD) is presented to explore global context information lying in high-level semantic features. Based on them, our LSKD assigns different distillation schemes according to feature characteristics. To adjust the effect of layer-specific distillation adaptively, LSKD introduces a regularized gradient equilibrium method to learn dynamic trade-off weights. Additionally, our LSKD makes an attempt to simultaneously learn distillation schemes and trade-off weights of different layers by developing a bi-level optimization method. Extensive experiments on widely used Pascal VOC 12 and ADE20K show our LSKD clearly outperforms its counterparts while achieving state-of-the-art results.

Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning

An Internal-External Constrained Distillation Framework for Continual Semantic Segmentation.

Multi-task Self-distillation for Graph-based Semi-Supervised Learning

Channel Self-Supervision for Online Knowledge Distillation

An Attention-based Representation Distillation Baseline for Multi-Label Continual Learning

Learning to Predict Gradients for Semi-Supervised Continual Learning

Contrastive Supervised Distillation for Continual Representation Learning

Online Distillation with Continual Learning for Cyclic Domain Shifts

Uncertainty-Aware Distillation for Semi-Supervised Few-Shot Class-Incremental Learning

Self-Training and Curriculum Learning Guided Dynamic Refined Network for Remote Sensing Class-Incremental Semantic Segmentation

Improving Structural and Semantic Global Knowledge in Graph Contrastive Learning with Distillation

Complementary Calibration: Boosting General Continual Learning With Collaborative Distillation and Self-Supervision

Uncertainty-Aware Contrastive Distillation for Incremental Semantic Segmentation

Self Supervision to Distillation for Long-Tailed Visual Recognition

Projected Latent Distillation for Data-Agnostic Consolidation in Distributed Continual Learning

ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning

CLDG: Contrastive Learning on Dynamic Graphs.

Rethinking the Representational Continuity: Towards Unsupervised Continual Learning

Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation

DC2T: Disentanglement-Guided Consolidation and Consistency Training for Semi-Supervised Cross-Site Continual Segmentation

Layer-Specific Knowledge Distillation for Class Incremental Semantic Segmentation.