Adaptive Informative Semantic Knowledge Transfer for Knowledge Distillation

Ruijian Xu,Ning Jiang,Jialiang Tang,Xinlei Huang
DOI: https://doi.org/10.1109/iscas58744.2024.10557974
2024-01-01
Abstract:Knowledge distillation aims to improve the generalization capacity of the student model by transferring knowledge from the teacher model. Existing feature-based methods explore knowledge transfer through hand-crafted feature mappings between teacher-student pairs. However, in different layers, the knowledge volume varies, and the knowledge exhibits semantic gaps. This leads to the possibility that hand-crafted layer associations may not enable the student model to effectively learn knowledge from the teacher model. We address this problem from two angles. On one hand, to ensure maximum knowledge transfer, we propose adaptive feature mapping based on the effective receptive field, which can quantify the knowledge volume of different layers and thus establish the optimal knowledge transfer paths between teacher-student pairs. On the other hand, to enhance the student model's ability to learn knowledge with semantic gaps from the teacher model, we propose adaptive feature fusion that fuses multiple intermediate layers of the teacher model as additional supervision. Experimental results demonstrate that the proposed method can significantly improve the performance of the student model.
What problem does this paper attempt to address?