Implicit Feature Alignment for Knowledge Distillation.

Dingyao Chen,Mengzhu Wang,Xiang Zhang,Tianyi Liang,Zhigang Luo
DOI: https://doi.org/10.1109/ictai56018.2022.00066
2022-01-01
Abstract:Knowledge distillation is a technique of transferring knowledge from a large teacher network to a light student one. Existing studies purely use immediate layers' features for distillation and may fail to gain insufficient semantic knowledge from the teacher. Inspired by recent advances in contrastive learning, we propose to introduce extra light embedding layers of the teacher to enforce its generalization ability and further align the mixup-type features for knowledge distillation in an implicit fashion (IFKD). IFKD allows the student to learn richer structural knowledge, thanks to the learned embedding layers of the teacher. Crucially, benefitting from a plethora of mixed samples, we can further adequately mine much semantic knowledge of the teacher. For efficiency, we propose a simple reversed mixup scheme to organize images and implicitly ensure complete positive information comparisons. Extensive experiments on image classification on two popular datasets including CIFAR-100 and ImageNet verify the effectiveness of our approach as compared to the previous methods.
What problem does this paper attempt to address?