Abstract:Knowledge distillation is a simple yet effective technique for deep model compression, which aims to transfer the knowledge learned by a large teacher model to a small student model. To mimic how the teacher teaches the student, existing knowledge distillation methods mainly adapt an unidirectional knowledge transfer, where the knowledge extracted from different intermedicate layers of the teacher model is used to guide the student model. However, it turns out that the students can learn more effectively through multi-stage learning with a self-reflection in the real-world education scenario, which is nevertheless ignored by current knowledge distillation methods. Inspired by this, we devise a new knowledge distillation framework entitled multi-target knowledge distillation via student self-reflection or MTKD-SSR, which can not only enhance the teacher's ability in unfolding the knowledge to be distilled, but also improve the student's capacity of digesting the knowledge. Specifically, the proposed framework consists of three target knowledge distillation mechanisms: a stage-wise channel distillation (SCD), a stage-wise response distillation (SRD), and a cross-stage review distillation (CRD), where SCD and SRD transfer feature-based knowledge (i.e., channel features) and response-based knowledge (i.e., logits) at different stages, respectively; and CRD encourages the student model to conduct self-reflective learning after each stage by a self-distillation of the response-based knowledge. Experimental results on five popular visual recognition datasets, CIFAR-100, Market-1501, CUB200-2011, ImageNet, and Pascal VOC, demonstrate that the proposed framework significantly outperforms recent state-of-the-art knowledge distillation methods.

Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation

An Embarrassingly Simple Approach for Knowledge Distillation

Multi-target Knowledge Distillation Via Student Self-reflection

Collaborative Knowledge Distillation Via Multiknowledge Transfer.

Lightweight Self-Knowledge Distillation with Multi-source Information Fusion

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation

Boosting Knowledge Distillation Via Intra-class Logit Distribution Smoothing

Knowledge Condensation Distillation

Improving Knowledge Distillation With a Customized Teacher

Comparative Knowledge Distillation

Student-friendly Knowledge Distillation

Knowledge Distillation via Token-Level Relationship Graph Based on the Big Data Technologies

Knowledge Augmentation for Distillation: A General and Effective Approach to Enhance Knowledge Distillation

Adaptive Cross-Architecture Mutual Knowledge Distillation

Collaborative Teacher-Student Learning via Multiple Knowledge Transfer

Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models

Annealing Knowledge Distillation

Skill-transferring Knowledge Distillation Method

Online Knowledge Distillation via Collaborative Learning

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling