Abstract:Knowledge distillation (KD) aims to build a lightweight deep neural network model under the guidance of a large-scale teacher model for model simplicity. Despite improved model efficiency through the KD technique, the performance gap between a teacher model and the trained student model remains significant. This is because the knowledge of the teacher model is not effectively transferred to the student model since the mapping landscape of the large-scale teacher model is not fully explored. To tackle this research gap, we propose a novel H ybrid M ix-up C ontrastive K nowledge D istillation (HMCKD) approach, which facilitates a thorough and reliable mapping solution space exploration in order to significantly improve the performance of the student model. Specifically, we design a hybrid mixing strategy, including image-level mixing and feature-level mixing, to form a smoother mapping landscape as a means to provide a stronger guidance in order to embed its richer dark knowledge from a teacher model to its student model. Additionally, we apply two other strategies, including contrastive learning and top-k guided selection, in order to ensure more effective knowledge transferability from the teacher model. Extensive experiments have proved that our proposed HMCKD approach outperforms state-of-the-art knowledge distillation methods when tested on 6 publicly available datasets, such as CIFAR-100, CIFAR-100-C, STL-10, SVHN, TinyImageNet, and ImageNet. Particularly on CIFAR-100 dataset, the average accuracy of students using HMCKD increased by 1.47%. Further, both the visualization results and similarity quantifications have confirmed the narrowed knowledge gap between the teacher and student models. Our source code is available at https://github.com/lambett/HMCKD .

Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup

DCCD: Reducing Neural Network Redundancy Via Distillation

Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study

Hybrid mix-up contrastive knowledge distillation

Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Knowledge Condensation Distillation

AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation.

Collaborative Knowledge Distillation Via Multiknowledge Transfer.

An Embarrassingly Simple Approach for Knowledge Distillation

DE-MKD: Decoupled Multi-Teacher Knowledge Distillation Based on Entropy

BD-KD: Balancing the Divergences for Online Knowledge Distillation

Harmonizing knowledge Transfer in Neural Network with Unified Distillation

Respecting Transfer Gap in Knowledge Distillation

Rethinking Knowledge Distillation Via Cross-Entropy

UniKD: Universal Knowledge Distillation for Mimicking Homogeneous or Heterogeneous Object Detectors

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Efficient and Robust Knowledge Distillation from A Stronger Teacher Based on Correlation Matching

Knowledge Augmentation for Distillation: A General and Effective Approach to Enhance Knowledge Distillation

Collaborative Knowledge Distillation

Homogeneous teacher based buffer knowledge distillation for tiny neural networks

Comparative Knowledge Distillation