Knowledge Augmentation for Distillation: A General and Effective Approach to Enhance Knowledge Distillation

Yinan Tang,Zhenhua Guo,Li Wang,Baoyu Fan,Fang Cao,Kai Gao,Hongwei Zhang,Rengang Li
DOI: https://doi.org/10.1145/3688863.3689569
2024-01-01
Abstract:Knowledge Distillation (KD), which extracts knowledge from a well-performed large neural network (a.k.a teacher network) to guide the training of a small network (a.k.a student network), has emerged as a promising approach for transfer learning and model compression. Nonetheless, unlike previous KD works which focus on how to better transfer existing knowledge from the teacher network to the student network, we enhance KD by augmenting and distilling extra knowledge. In this paper, we propose Knowledge Augmentation for Distillation (KAD), which mines and transfers augmented knowledge by generating augmented samples. Besides, we further enhance KAD with a metric learning method called N-pair loss, which can make full use of the augmented samples and boost the compressed student network based on the N-pair structure. We perform extensive experiments on widely-used image benchmarks, and the experimental results show that our KAD can not only flexibly work together with various existing KD methods, but also achieve consistent improvements in terms of classification accuracy.
What problem does this paper attempt to address?