Abstract:Dataset distillation aims to compress a training dataset by creating a small number of informative synthetic samples such that neural networks trained on them perform as well as those trained on the original training dataset. Current text dataset distillation methods create each synthetic sample as a sequence of word embeddings instead of a text to apply gradient-based optimization; however, such embedding-level distilled datasets cannot be used for training other models whose word embedding weights are different from the model used for distillation. To address this issue, we propose a novel text dataset distillation approach, called Distilling dataset into Language Model (DiLM), which trains a language model to generate informative synthetic training samples as text data, instead of directly optimizing synthetic samples. We evaluated DiLM on various text classification datasets and showed that distilled synthetic datasets from DiLM outperform those from current coreset selection methods. DiLM achieved remarkable generalization performance in training different types of models and in-context learning of large language models. Our code will be available at

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper attempts to solve several key problems in text dataset distillation: 1. **Limitations of existing methods**: - Current text dataset distillation methods usually generate synthetic samples by optimizing word embeddings rather than directly generating text. The synthetic samples generated by this method cannot be used to train other models with different word embedding weights, limiting their flexibility in practical applications. - The generated word embedding sequences are completely unreadable to humans, making it difficult to interpret and analyze the original training dataset. 2. **Challenges in optimizing discrete text**: - Due to the discrete nature of text, it is very difficult to directly optimize text to generate synthetic samples. Existing methods bypass this problem by optimizing continuous word embeddings, but the synthetic samples generated by this method cannot be directly used to train other models. 3. **Model - independent applications**: - Researchers hope to develop a method that can generate text - level synthetic datasets, which can be used to train different types of models, not just specific pre - trained models. ### Solutions To overcome the above problems, the paper proposes a new text dataset distillation method called "Distilling dataset into Language Model (DiLM)". Specifically, the main contributions of DiLM include: 1. **Generate text - level synthetic datasets**: - DiLM uses a language model to generate text - level synthetic samples rather than directly optimizing word embeddings. This makes the generated synthetic datasets can be used to train models with different word embedding weights, improving model - independence. 2. **Optimization method**: - To overcome the optimization difficulties of text discreteness, DiLM trains the language model by minimizing the gradient matching loss between the generated samples and the real samples. By designing a differentiable back - propagation path, DiLM can effectively optimize the language model parameters. 3. **Experimental verification**: - Researchers conducted experiments on multiple text classification datasets, and the results show that the synthetic datasets generated by DiLM not only perform better than the current coreset selection methods when training the same model, but also perform excellently when training different types of models, especially in the context learning of large language models (LLMs) under few - shot prompting. ### Conclusion DiLM solves the limitations of existing text dataset distillation methods by generating text - level synthetic datasets, improving the interpretability and model - independence of synthetic datasets. The experimental results show that DiLM performs excellently on multiple tasks and has broad application prospects.

DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation

Soft-Label Dataset Distillation and Text Dataset Distillation

DiM: Distilling Dataset into Generative Model

DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining

Generative Dataset Distillation: Balancing Global Structure and Local Details

Latent Dataset Distillation with Diffusion Models

Data-Free Distillation of Language Model by Text-to-Text Transfer

What is Dataset Distillation Learning?

Vision-Language Dataset Distillation

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Data-to-Model Distillation: Data-Efficient Learning Framework

Dataset Distillation via Curriculum Data Synthesis in Large Data Era

Curriculum Dataset Distillation

Data-Free Adversarial Distillation

Dataset Distillation: A Comprehensive Review

CoDi: Conversational Distillation for Grounded Question Answering

D$^4$M: Dataset Distillation via Disentangled Diffusion Model

BiLD: Bi-directional Logits Difference Loss for Large Language Model Distillation

Adversarial Self-Supervised Data-Free Distillation for Text Classification

DistiLLM: Towards Streamlined Distillation for Large Language Models

Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation