Abstract:In this paper, we introduce a novel knowledge distillation approach for the semantic segmentation task. Unlike previous methods that rely on power-trained teachers or other modalities to provide additional knowledge, our approach does not require complex teacher models or information from extra sensors. Specifically, for the teacher model training, we propose to noise the label and then incorporate it into input to effectively boost the lightweight teacher performance. To ensure the robustness of the teacher model against the introduced noise, we propose a dual-path consistency training strategy featuring a distance loss between the outputs of two paths. For the student model training, we keep it consistent with the standard distillation for simplicity. Our approach not only boosts the efficacy of knowledge distillation but also increases the flexibility in selecting teacher and student models. To demonstrate the advantages of our Label Assisted Distillation (LAD) method, we conduct extensive experiments on five challenging datasets including Cityscapes, ADE20K, PASCAL-VOC, COCO-Stuff 10K, and COCO-Stuff 164K, five popular models: FCN, PSPNet, DeepLabV3, STDC, and OCRNet, and results show the effectiveness and generalization of our approach. We posit that incorporating labels into the input, as demonstrated in our work, will provide valuable insights into related fields. Code is available at <a class="link-external link-https" href="https://github.com/skyshoumeng/Label_Assisted_Distillation" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the performance of lightweight student models in semantic segmentation tasks through knowledge distillation techniques while reducing the dependence on complex teacher models or additional modal data. Specifically, the authors propose a new knowledge distillation method - Label Assisted Distillation (LAD). By adding noisy labels to the input of the teacher model during the training process to enhance its performance, and a two - path consistency training strategy is proposed to improve the robustness of the teacher model to noise. This method can not only effectively improve the effect of knowledge distillation, but also increase the flexibility of choosing teacher and student models. ### Main contributions of the paper include: 1. **Proposing a new knowledge distillation method**: Using noisy labels as privileged information, reducing the dependence on complex teacher models or other modal data, and being able to effectively improve the performance of knowledge distillation. 2. **Introducing a two - path consistency training strategy**: In order to enhance the robustness of the teacher model to the introduced noise, a two - path consistency training strategy including distance loss is proposed to minimize the difference between the outputs of the two paths. 3. **Extensive experimental verification**: A large number of experiments were carried out on five popular semantic segmentation benchmark models (FCN, PSPNet, DeepLabV3, STDC, OCRNet) and five challenging datasets (Cityscapes, ADE20K, PASCAL - VOC, COCO - Stuff 10K, COCO - Stuff 164K). The experimental results show that this method has a significant and consistent performance improvement. ### Method overview: - **Label Noise Module (LNM)**: By performing class - level noise and pixel - level noise processing on labels, noisy labels are generated as the input of the teacher model. - **Two - path consistency training**: During the teacher model training process, two independently sampled parameters are used to generate two noisy labels, which are respectively input into two identical teacher models. By introducing consistency loss to ensure the consistency of the outputs of the two models. - **Student model training**: The training of the student model is the same as the standard knowledge distillation method. The student model is guided to learn through label supervision and teacher model supervision together. ### Experimental results: - **Comparison with existing methods**: The experimental results on multiple datasets and models show that the proposed method outperforms several existing advanced knowledge distillation methods in performance, such as SKD, IFVD, CWD, CIRKD, etc. - **Generalization ability**: Experiments on different datasets and models further prove the generalization ability of this method. In general, by introducing noisy labels as privileged information, this paper proposes a new knowledge distillation method, which effectively improves the performance of lightweight student models in semantic segmentation tasks and reduces the dependence on complex teacher models.

Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation

Augmentation-Free Dense Contrastive Knowledge Distillation for Efficient Semantic Segmentation

LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection

Augmentation-free Dense Contrastive Distillation for Efficient Semantic Segmentation

Semi-supervised Semantic Segmentation with Mutual Knowledge Distillation

Adaptive Perspective Distillation for Semantic Segmentation

Guided Distillation for Semi-Supervised Instance Segmentation

Difference-Aware Distillation for Semantic Segmentation

FAKD: Feature Augmented Knowledge Distillation for Semantic Segmentation

Teaching What You Should Teach: A Data-Based Distillation Method

Local structure consistency and pixel-correlation distillation for compact semantic segmentation

Self Pseudo Entropy Knowledge Distillation for Semi-supervised Semantic Segmentation

Normalized Feature Distillation for Semantic Segmentation

Efficient Semantic Segmentation Via Self-Attention and Self-Distillation

Self-Distillation for Robust LiDAR Semantic Segmentation in Autonomous Driving

Bridging Knowledge Distillation Gap for Few-sample Unsupervised Semantic Segmentation

Adaptive Affinity Loss and Erroneous Pseudo-Label Refinement for Weakly Supervised Semantic Segmentation

Semantic segmentation method for continuous images based on multi-level knowledge distillation

PPDistiller: Weakly-supervised 3D point cloud semantic segmentation framework via point-to-pixel distillation

Robust Teacher: Self-correcting Pseudo-Label-guided Semi-Supervised Learning for Object Detection

Few-shot Class-Incremental Semantic Segmentation via Pseudo-Labeling and Knowledge Distillation