Classifier Reuse-Based Contrastive Knowledge Distillation

Zhipeng Gao,Shu Yang,Xinlei Yu,Zijia Mo,Lanlan Rui,Yang
DOI: https://doi.org/10.1109/edge62653.2024.00024
2024-01-01
Abstract:As a popular method for model lightweighting, knowledge distillation has seen extensive development in recent years. Nevertheless, it heavily relies on labeled data for training, and in the absence of labeled data support, traditional knowledge distillation encounters significant hurdles. Therefore, it is crucial to explore the application of knowledge distillation in this field, aiming to improve the performance of lightweight models within unsupervised edge scenarios. We integrate contrastive learning, the currently prevalent self-supervised learning technique, with knowledge distillation to enhance the model’s knowledge distillation task through the assessment of similarity between sample pairs. Concurrently, we utilize a projection head to align features between the student and teacher models, employing the Siamese Classifier method to enable the reuse of the pre-trained classifier in the student model. This obviates the need for retraining the classifier and allows the student to acquire more knowledge from the teacher. A series of experimental results showcases that our model exhibits state-of-the-art performance on the benchmark dataset CIFAR-100, particularly when the ResNet $32 \times 4$ serves as the teacher for instructing the ResNet8 $\times 4$, surpassing other methods on average by $3.8 \%$.
What problem does this paper attempt to address?