Abstract:Data-free knowledge distillation (DFKD) improves the student model (S) by mimicking the class probability from a pre-trained teacher model (T) without training data. Under such setting, an ideal scenario is that T can help generate "good" samples from a generator (G) to maximally benefit S. However, existing arts suffer from the non-ideal generated samples under the disturbance of the gap (i.e., either too large or small) between the class probabilities of T and S; for example, the generated samples with too large gap may exhibit excessive information for S, while too small gap leads to the limited knowledge in the samples, resulting into the poor generalization. Meanwhile, they fail to judge the "goodness" of the generated samples for S since the fixed T is not necessarily ideal. In this paper, we aim to answer what is inside the gap box; together with how to yield "good" generated samples for DFKD? To this end, we propose a Gap-Sensitive Sample Generation (GapSSG) approach, by revisiting the empirical distilled risk from a data-free perspective, which confirms the existence of an ideal teacher (T *), while theoretically implying: (1) the gap disturbance originates from the mismatch between T and T *, hence the class probabilities of T enable the approximation to those of T *; and (2) "good" samples should maximally benefit S via T's class probabilities, owing to unknown T *. To this end, we unpack the gap box between T and S as two findings: inherent gap to perceive T and T *; derived gap to monitor S and T *. Benefiting from the derived gap that focuses on the adaptability of generated sample to S, we attempt to track student's training route (a series of training epochs) to capture the category distribution of S; upon which, a regulatory factor is further devised to approximate T * over inherent gap, so as to generate "good" samples to S. Furthermore, during the distillation process, a sample-balanced strategy comes up to tackle the overfitting and missing knowledge issues between the generated partial and critical samples by training G. The theoretical and empirical studies verify the advantages of GapSSG over the state-of-the-arts.

Sampling to Distill: Knowledge Transfer from Open-World Data

Up to 100x Faster Data-Free Knowledge Distillation

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

AdaDFKD: Exploring adaptive inter-sample relationship in data-free knowledge distillation

Small Scale Data-Free Knowledge Distillation

Data-Free Adversarial Distillation

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Hybrid Data-Free Knowledge Distillation

Stage-by-stage Knowledge Distillation

Dynamic Data-Free Knowledge Distillation by Easy-to-Hard Learning Strategy

Robustness and Diversity Seeking Data-Free Knowledge Distillation

Better Together: Data-Free Multi-Student Coevolved Distillation

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

CDFKD-MFS: Collaborative Data-free Knowledge Distillation via Multi-level Feature Sharing

CDFKD-MFS: Collaborative Data-free Knowledge Distillation Via Multi-level Feature Sharing

An Embarrassingly Simple Approach for Knowledge Distillation

Semi-Online Knowledge Distillation

Unpacking the Gap Box Against Data-Free Knowledge Distillation

Deep Collective Knowledge Distillation