DiM: Distilling Dataset into Generative Model

Kai Wang,Jianyang Gu,Daquan Zhou,Zheng Zhu,Wei Jiang,Yang You

2023-10-11

Abstract:Dataset distillation reduces the network training cost by synthesizing small and informative datasets from large-scale ones. Despite the success of the recent dataset distillation algorithms, three drawbacks still limit their wider application: i). the synthetic images perform poorly on large architectures; ii). they need to be re-optimized when the distillation ratio changes; iii). the limited diversity restricts the performance when the distillation ratio is large. In this paper, we propose a novel distillation scheme to \textbf{D}istill information of large train sets \textbf{i}nto generative \textbf{M}odels, named DiM. Specifically, DiM learns to use a generative model to store the information of the target dataset. During the distillation phase, we minimize the differences in logits predicted by a models pool between real and generated images. At the deployment stage, the generative model synthesizes various training samples from random noises on the fly. Due to the simple yet effective designs, the trained DiM can be directly applied to different distillation ratios and large architectures without extra cost. We validate the proposed DiM across 4 datasets and achieve state-of-the-art results on all of them. To the best of our knowledge, we are the first to achieve higher accuracy on complex architectures than simple ones, such as 75.1\% with ResNet-18 and 72.6\% with ConvNet-3 on ten images per class of CIFAR-10. Besides, DiM outperforms previous methods with 10\% $\sim$ 22\% when images per class are 1 and 10 on the SVHN dataset.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address several key issues present in the data distillation process: 1. **Poor performance of synthetic images on large-scale architectures**: The images generated by existing data distillation methods perform poorly on larger neural network architectures such as ResNet, VGG, and DenseNet. 2. **Need for re-optimization to adapt to different distillation ratios**: When the distillation ratio changes, existing methods typically require retraining the distillation phase. 3. **Limited diversity leading to performance degradation at high distillation ratios**: Limited data diversity restricts the model's performance at high distillation ratios. To address these issues, the authors propose a new framework called DiM (Distilling Information into Models). This framework stores the information of the original dataset in a generative model instead of directly synthesizing images. Specifically, DiM achieves knowledge distillation by minimizing the prediction logits difference between real and generated images. This method can be directly applied to different distillation ratios and large-scale architectures without additional costs. Experimental results show that DiM achieves the best results across multiple datasets and exhibits more stable performance on different architectures.

DiM: Distilling Dataset into Generative Model

Data-to-Model Distillation: Data-Efficient Learning Framework

Generative Dataset Distillation: Balancing Global Structure and Local Details

D$^4$M: Dataset Distillation via Disentangled Diffusion Model

Generalizing Dataset Distillation via Deep Generative Prior

Efficient Dataset Distillation via Minimax Diffusion

Data-Efficient Generation for Dataset Distillation

Curriculum Dataset Distillation

Accelerating Dataset Distillation Via Model Augmentation

Latent Dataset Distillation with Diffusion Models

Masked Generative Distillation

DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation

Improved Distribution Matching Distillation for Fast Image Synthesis

Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality

Dataset Distillation via Curriculum Data Synthesis in Large Data Era

Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization

Data-Free Adversarial Distillation

Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Generative Dataset Distillation Based on Diffusion Model

One Category One Prompt: Dataset Distillation using Diffusion Models