Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation

Shoumeng Qiu,Jie Chen,Xinrun Li,Ru Wan,Xiangyang Xue,Jian Pu
2024-07-18
Abstract:In this paper, we introduce a novel knowledge distillation approach for the semantic segmentation task. Unlike previous methods that rely on power-trained teachers or other modalities to provide additional knowledge, our approach does not require complex teacher models or information from extra sensors. Specifically, for the teacher model training, we propose to noise the label and then incorporate it into input to effectively boost the lightweight teacher performance. To ensure the robustness of the teacher model against the introduced noise, we propose a dual-path consistency training strategy featuring a distance loss between the outputs of two paths. For the student model training, we keep it consistent with the standard distillation for simplicity. Our approach not only boosts the efficacy of knowledge distillation but also increases the flexibility in selecting teacher and student models. To demonstrate the advantages of our Label Assisted Distillation (LAD) method, we conduct extensive experiments on five challenging datasets including Cityscapes, ADE20K, PASCAL-VOC, COCO-Stuff 10K, and COCO-Stuff 164K, five popular models: FCN, PSPNet, DeepLabV3, STDC, and OCRNet, and results show the effectiveness and generalization of our approach. We posit that incorporating labels into the input, as demonstrated in our work, will provide valuable insights into related fields. Code is available at <a class="link-external link-https" href="https://github.com/skyshoumeng/Label_Assisted_Distillation" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the performance of lightweight student models in semantic segmentation tasks through knowledge distillation techniques while reducing the dependence on complex teacher models or additional modal data. Specifically, the authors propose a new knowledge distillation method - Label Assisted Distillation (LAD). By adding noisy labels to the input of the teacher model during the training process to enhance its performance, and a two - path consistency training strategy is proposed to improve the robustness of the teacher model to noise. This method can not only effectively improve the effect of knowledge distillation, but also increase the flexibility of choosing teacher and student models. ### Main contributions of the paper include: 1. **Proposing a new knowledge distillation method**: Using noisy labels as privileged information, reducing the dependence on complex teacher models or other modal data, and being able to effectively improve the performance of knowledge distillation. 2. **Introducing a two - path consistency training strategy**: In order to enhance the robustness of the teacher model to the introduced noise, a two - path consistency training strategy including distance loss is proposed to minimize the difference between the outputs of the two paths. 3. **Extensive experimental verification**: A large number of experiments were carried out on five popular semantic segmentation benchmark models (FCN, PSPNet, DeepLabV3, STDC, OCRNet) and five challenging datasets (Cityscapes, ADE20K, PASCAL - VOC, COCO - Stuff 10K, COCO - Stuff 164K). The experimental results show that this method has a significant and consistent performance improvement. ### Method overview: - **Label Noise Module (LNM)**: By performing class - level noise and pixel - level noise processing on labels, noisy labels are generated as the input of the teacher model. - **Two - path consistency training**: During the teacher model training process, two independently sampled parameters are used to generate two noisy labels, which are respectively input into two identical teacher models. By introducing consistency loss to ensure the consistency of the outputs of the two models. - **Student model training**: The training of the student model is the same as the standard knowledge distillation method. The student model is guided to learn through label supervision and teacher model supervision together. ### Experimental results: - **Comparison with existing methods**: The experimental results on multiple datasets and models show that the proposed method outperforms several existing advanced knowledge distillation methods in performance, such as SKD, IFVD, CWD, CIRKD, etc. - **Generalization ability**: Experiments on different datasets and models further prove the generalization ability of this method. In general, by introducing noisy labels as privileged information, this paper proposes a new knowledge distillation method, which effectively improves the performance of lightweight student models in semantic segmentation tasks and reduces the dependence on complex teacher models.