Holistic Weighted Distillation for Semantic Segmentation

Wujie Sun,Defang Chen,Can Wang,Deshi Ye,Yan Feng,Chun Chen
DOI: https://doi.org/10.1109/ICME55011.2023.00075
2023-01-01
Abstract:Channel-wise distillation for semantic segmentation has proven to be a more effective method than spatial-based distillation. By removing the redundant information from the teacher model, the student can focus on specific channel-related pixels, which can be viewed as a weighting of the pixels. However, the standard channel-wise distillation ignores the fact that such importance difference also exists among channels. In this paper, we propose a novel method called Holistic Weighted Distillation (HWD) to address this issue. We calculate the channel divergences between the teacher and the student, and convert them into distillation weights, making the student focus more on learning channels that are not well mastered, thus improving the final model performance. Besides, our method does not introduce additional network structure or back-propagation process, which improves the training efficiency. Experiments on ADE20K, Cityscapes, and COCO-Stuff demonstrate the superiority of our method. The code is available at https://github.com/zju-SWJ/HWD.
What problem does this paper attempt to address?