Abstract:Dataset distillation is a newly emerging task that synthesizes a small-size dataset used in training deep neural networks (DNNs) for reducing data storage and model training costs. The synthetic datasets are expected to capture the essence of the knowledge contained in real-world datasets such that the former yields a similar performance as the latter. Recent advancements in distillation methods have produced notable improvements in generating synthetic datasets. However, current state-of-the-art methods treat the entire synthetic dataset as a unified entity and optimize each synthetic instance equally. This static optimization approach may lead to performance degradation in dataset distillation. Specifically, we argue that static optimization can give rise to a coupling issue within the synthetic data, particularly when a larger amount of synthetic data is being optimized. This coupling issue, in turn, leads to the failure of the distilled dataset to extract the high-level features learned by the deep neural network (DNN) in the latter epochs. In this study, we propose a new dataset distillation strategy called Sequential Subset Matching (SeqMatch), which tackles this problem by adaptively optimizing the synthetic data to encourage sequential acquisition of knowledge during dataset distillation. Our analysis indicates that SeqMatch effectively addresses the coupling issue by sequentially generating the synthetic instances, thereby enhancing its performance significantly. Our proposed SeqMatch outperforms state-of-the-art methods in various datasets, including SVNH, CIFAR-10, CIFAR-100, and Tiny ImageNet. Our code is available at <a class="link-external link-https" href="https://github.com/shqii1j/seqmatch" rel="external noopener nofollow">this https URL</a>.

Going Beyond Feature Similarity: Effective Dataset distillation based on Class-aware Conditional Mutual Information

DCCD: Reducing Neural Network Redundancy Via Distillation

Using Less but Important Information for Feature Distillation

Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation

Accelerating Dataset Distillation Via Model Augmentation

Dataset Distillation via the Wasserstein Metric

Dataset Distillation: A Comprehensive Review

Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning

Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator

Exploring the Impact of Dataset Bias on Dataset Distillation

Dataset Distillation with Channel Efficient Process

Mitigating Bias in Dataset Distillation

D$^4$M: Dataset Distillation via Disentangled Diffusion Model

Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation

Curriculum Dataset Distillation

Enhancing Dataset Distillation via Label Inconsistency Elimination and Learning Pattern Refinement

Sequential Subset Matching for Dataset Distillation

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios

MetaDD: Boosting Dataset Distillation with Neural Network Architecture-Invariant Generalization

Distilling Long-tailed Datasets