Category contrastive distillation with self-supervised classification

Weiwei Chen,Jiazhen Xu,Yujie Zheng,Chong Wang
DOI: https://doi.org/10.1007/s11760-024-03678-0
IF: 1.583
2024-12-11
Signal Image and Video Processing
Abstract:Knowledge distillation (KD) is a technique for transferring knowledge from a teacher model to a student model, commonly used in model compression and transfer learning. While existing KD methods have primarily focused on inter-sample or intra-sample knowledge without fully leveraging supervised labels, we present a novel approach that integrates self-supervised tasks into the KD framework to develop a category-based distillation algorithm. Our approach introduces two distinct memory banks for storing category embeddings predicted by the teacher and student models. By utilizing these memory banks, the student model absorbs knowledge across batches, enhancing the learning process through category embedding contrast rather than relying solely on intra-model knowledge transfer. Additionally, we incorporate self-supervised tasks with diverse data augmentation strategies to guide training in a multi-task framework. Experimental results demonstrate that our method achieves significant improvements, outperforming leading sample-based KD techniques. Specifically, our approach shows superior performance on both CIFAR-100 and ImageNet datasets.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?