Hybrid Data-Free Knowledge Distillation

Jialiang Tang,Shuo Chen,Chen Gong
2024-12-18
Abstract:Data-free knowledge distillation aims to learn a compact student network from a pre-trained large teacher network without using the original training data of the teacher network. Existing collection-based and generation-based methods train student networks by collecting massive real examples and generating synthetic examples, respectively. However, they inevitably become weak in practical scenarios due to the difficulties in gathering or emulating sufficient real-world data. To solve this problem, we propose a novel method called \textbf{H}ybr\textbf{i}d \textbf{D}ata-\textbf{F}ree \textbf{D}istillation (HiDFD), which leverages only a small amount of collected data as well as generates sufficient examples for training student networks. Our HiDFD comprises two primary modules, \textit{i.e.}, the teacher-guided generation and student distillation. The teacher-guided generation module guides a Generative Adversarial Network (GAN) by the teacher network to produce high-quality synthetic examples from very few real-world collected examples. Specifically, we design a feature integration mechanism to prevent the GAN from overfitting and facilitate the reliable representation learning from the teacher network. Meanwhile, we drive a category frequency smoothing technique via the teacher network to balance the generative training of each category. In the student distillation module, we explore a data inflation strategy to properly utilize a blend of real and synthetic data to train the student network via a classifier-sharing-based feature alignment technique. Intensive experiments across multiple benchmarks demonstrate that our HiDFD can achieve state-of-the-art performance using 120 times less collected data than existing methods. Code is available at <a class="link-external link-https" href="https://github.com/tangjialiang97/HiDFD" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to effectively train a compact student network when the original training data of the teacher network is inaccessible. Specifically, the existing data - free knowledge distillation (DFKD) methods based on collected and generated data have encountered difficulties in practical applications because they are difficult to obtain or simulate sufficient real - world data. To solve this problem, the authors propose a new method - Hybrid Data - Free Distillation (HiDFD), which only requires a small amount of collected real data and generates a sufficient amount of synthetic data to train the student network. ### Specific background of the problem 1. **The success of deep neural networks (DNNs) is accompanied by significant computational and storage requirements**, which limits their deployment on resource - limited devices. 2. **Knowledge distillation (KD)** is an effective compression technique that improves the performance of a lightweight student network by transferring the knowledge of a complex pre - trained teacher network to it. However, in practice, due to privacy issues, the training data of the teacher network is usually unavailable, and only the pre - trained teacher network itself can be used to learn the student network. 3. **Limitations of existing DFKD methods**: - **Collection - based methods** require a large number of real examples, but in practical tasks (such as medical image classification), it is very difficult to obtain sufficient training samples. - **Generation - based methods** use the teacher network to guide the generation model to generate fake examples, but the lack of real - data supervision may lead to low - quality generation and affect the performance of the student network. ### HiDFD solution To overcome the above problems, the authors propose HiDFD, which consists of two main modules: 1. **Teacher - guided generation module**: - **Feature integration mechanism**: Prevent the generative adversarial network (GAN) from overfitting and promote reliable representation learning from the teacher network. - **Class - frequency smoothing technique**: Balance the generation training of each class through the teacher network. 2. **Student distillation module**: - **Data expansion strategy**: Reasonably use the mixture of real and synthetic data to train the student network. - **Classifier - shared feature alignment technique**: Ensure that the features of the student network are aligned with those of the teacher network, thereby improving performance. ### Experimental results Experiments show that HiDFD can achieve the state - of - the - art performance of existing methods using only 1/120 of the collected data, and it performs particularly well on the challenging HAM and ImageNet datasets. ### Summary HiDFD solves the deficiencies of existing DFKD methods in practical applications by combining a small amount of real data and high - quality synthetic data, and can effectively train a reliable student network without the original training data.