Improving Knowledge Distillation via Transferring Learning Ability

Long Liu,Tong Li,Hui Cheng
2023-09-18
Abstract:Existing knowledge distillation methods generally use a teacher-student approach, where the student network solely learns from a well-trained teacher. However, this approach overlooks the inherent differences in learning abilities between the teacher and student networks, thus causing the capacity-gap problem. To address this limitation, we propose a novel method called SLKD.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address two main issues in the process of knowledge distillation: capacity gap and the insufficient learning ability of the student network. 1. **Capacity Gap**: In traditional knowledge distillation methods, due to structural differences between the teacher network and the student network, their learning abilities are usually different. When the capacity of the student network is much smaller than that of the teacher network, this capacity gap can lead to poor performance of the student network, thereby limiting the effectiveness of existing methods. 2. **Insufficient Learning Ability**: Due to the different architectures of the student network and the teacher network, the performance of the student network in the initial stage differs significantly from that of the teacher network, thus affecting its learning effectiveness. To address the above issues, the authors propose a new knowledge distillation framework—Self-Learning Teacher Knowledge Distillation (SLKD). This method introduces a self-learning teacher network (Self-Learning Teacher, SL-T), which has the same architecture as the teacher network but is not pre-trained. During the knowledge distillation process, it simultaneously learns from the teacher network as a student and guides the student network as a teacher. This approach enables the student network to acquire learning abilities similar to those of the teacher network, thereby alleviating the capacity gap and insufficient learning ability issues, significantly enhancing the effectiveness of knowledge distillation. Experimental results show that SLKD outperforms various existing knowledge distillation methods on the CIFAR-100 and ImageNet datasets.