Abstract:Dataset distillation aims to compress information from a large-scale original dataset to a new compact dataset while striving to preserve the utmost degree of the original data informational essence. Previous studies have predominantly concentrated on aligning the intermediate statistics between the original and distilled data, such as weight trajectory, features, gradient, BatchNorm, etc. In this work, we consider addressing this task through the new lens of model informativeness in the compression stage on the original dataset pretraining. We observe that with the prior state-of-the-art SRe$^2$L, as model sizes increase, it becomes increasingly challenging for supervised pretrained models to recover learned information during data synthesis, as the channel-wise mean and variance inside the model are flatting and less informative. We further notice that larger variances in BN statistics from self-supervised models enable larger loss signals to update the recovered data by gradients, enjoying more informativeness during synthesis. Building on this observation, we introduce SC-DD, a simple yet effective Self-supervised Compression framework for Dataset Distillation that facilitates diverse information compression and recovery compared to traditional supervised learning schemes, further reaps the potential of large pretrained models with enhanced capabilities. Extensive experiments are conducted on CIFAR-100, Tiny-ImageNet and ImageNet-1K datasets to demonstrate the superiority of our proposed approach. The proposed SC-DD outperforms all previous state-of-the-art supervised dataset distillation methods when employing larger models, such as SRe$^2$L, MTT, TESLA, DC, CAFE, etc., by large margins under the same recovery and post-training budgets. Code is available at

Dataset Distillation with Channel Efficient Process

DCCD: Reducing Neural Network Redundancy Via Distillation

Dataset Distillation in Latent Space

Dataset Distillation: A Comprehensive Review

Accelerating Dataset Distillation Via Model Augmentation

Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality

Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning

Data-Efficient Generation for Dataset Distillation

Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation

Dataset Distillation via the Wasserstein Metric

A Comprehensive Survey of Dataset Distillation

What is Dataset Distillation Learning?

Dataset Distillation via Factorization

Latent Dataset Distillation with Diffusion Models

Self-supervised Dataset Distillation: A Good Compression Is All You Need

Dataset Condensation with Distribution Matching

Enhancing Dataset Distillation via Label Inconsistency Elimination and Learning Pattern Refinement

Data-to-Model Distillation: Data-Efficient Learning Framework

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios

Distributional Dataset Distillation with Subtask Decomposition

D$^4$M: Dataset Distillation via Disentangled Diffusion Model