Abstract:Dataset distillation aims to distill the knowledge of a large-scale real dataset into small yet informative synthetic data such that a model trained on it performs as well as a model trained on the full dataset. Despite recent progress, existing dataset distillation methods often struggle with computational efficiency, scalability to complex high-resolution datasets, and generalizability to deep architectures. These approaches typically require retraining when the distillation ratio changes, as knowledge is embedded in raw pixels. In this paper, we propose a novel framework called Data-to-Model Distillation (D2M) to distill the real dataset's knowledge into the learnable parameters of a pre-trained generative model by aligning rich representations extracted from real and generated images. The learned generative model can then produce informative training images for different distillation ratios and deep architectures. Extensive experiments on 15 datasets of varying resolutions show D2M's superior performance, re-distillation efficiency, and cross-architecture generalizability. Our method effectively scales up to high-resolution 128x128 ImageNet-1K. Furthermore, we verify D2M's practical benefits for downstream applications in neural architecture search.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are the limitations of existing dataset distillation methods in terms of computational efficiency, applicability to high - resolution large - scale datasets, and cross - architecture generalization ability. Specifically: 1. **Computational efficiency**: Existing dataset distillation methods need to be retrained when changing the distillation ratio (i.e., the number of images per category), which leads to a computationally costly re - distillation process. 2. **Applicability to high - resolution large - scale datasets**: These methods are usually difficult to handle high - resolution large - scale datasets (such as ImageNet - 1K of 128×128), and the generated images often have visual noise. 3. **Cross - architecture generalization ability**: The distilled datasets perform poorly on different deep - learning architectures (such as ResNet, DenseNet, and ViT). To solve these problems, the authors propose a new framework - Data - to - Model distillation (D2M). D2M effectively solves the above problems by distilling the knowledge of the real dataset into the parameter space of the generative model instead of relying on the original pixel data. This method not only improves computational efficiency but also can generate high - quality synthetic images, which are suitable for datasets of different resolutions and multiple deep - learning architectures. ### Main contributions 1. **Novel framework**: A new framework for distilling the knowledge of large - scale datasets into the parameter space of the generative model is proposed, which can generate informative images for classification tasks and provide diverse supervision. 2. **Extensive experimental verification**: Extensive experiments have been carried out on 15 datasets with different resolutions, label complexities, and application domains, achieving state - of - the - art results, and being able to perform distillation on ImageNet - 1K of 128×128, with the storage complexity remaining fixed under different settings. 3. **Efficiency and generalization ability**: The superiority of D2M in re - distillation efficiency and cross - architecture generalization ability is demonstrated, and the generated high - quality images significantly improve the performance of downstream tasks such as neural architecture search. ### Method overview The core idea of D2M is to distill knowledge in the parameter space of the generative model instead of directly distilling into the original pixel data. The specific steps are as follows: 1. **Pre - train the generative model**: Pre - train a generative model using the standard GAN loss function. 2. **Distillation stage**: - Randomly select a batch of noise and labels to generate synthetic images. - At the same time, randomly select a batch of images and their labels from the real dataset. - Use the randomly initialized neural network in the model pool to extract features and predict classification logits. - Minimize the differences between real and synthetic images through the embedding matching and prediction matching modules. 3. **Optimize the generative model**: Update the parameters of the generative model by minimizing the embedding matching loss and the prediction matching loss. ### Experimental results D2M performs well on multiple datasets, especially on CIFAR - 10/100 and Tiny ImageNet. It can reach nearly 88% of the upper - limit performance using only 1% of the training data, and can reach about 70% of the upper - limit performance on Tiny ImageNet using 2% of the training data. On the higher - resolution ImageNet - 1K and its subsets, D2M also significantly outperforms other methods, especially on the ImageWoof dataset, with a performance improvement of more than 2.7%. In conclusion, D2M effectively solves the limitations of existing dataset distillation methods through an innovative distillation method, providing a new solution for data - efficient machine learning.

Data-to-Model Distillation: Data-Efficient Learning Framework

D$^4$M: Dataset Distillation via Disentangled Diffusion Model

Data-Efficient Generation for Dataset Distillation

DiM: Distilling Dataset into Generative Model

One Category One Prompt: Dataset Distillation using Diffusion Models

Latent Dataset Distillation with Diffusion Models

Data-Free Adversarial Distillation

Generative Dataset Distillation: Balancing Global Structure and Local Details

Curriculum Dataset Distillation

Accelerating Dataset Distillation Via Model Augmentation

Generalizing Dataset Distillation via Deep Generative Prior

Data Distillation: A Survey

Dataset Distillation via Curriculum Data Synthesis in Large Data Era

On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Efficient Dataset Distillation via Minimax Diffusion

DataDAM: Efficient Dataset Distillation with Attention Matching

Diffusion-Augmented Coreset Expansion for Scalable Dataset Distillation

Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality

A Comprehensive Survey of Dataset Distillation

Data-Distortion Guided Self-Distillation for Deep Neural Networks