Abstract:Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting). This challenge arises from disparities in the evaluation of synthetic data. In classification, the synthetic data is considered well-distilled if the model trained with the full dataset and the model trained with the synthetic dataset yield identical labels for the same input, regardless of variations in output logits distribution. Conversely, in TS-forecasting, the effectiveness of synthetic data distillation is determined by the distance between predictions of the two models. The synthetic data is deemed well-distilled only when all data points within the predictions are similar. Consequently, TS-forecasting has a more rigorous evaluation methodology compared to classification. To mitigate this gap, we theoretically analyze the optimization objective of dataset condensation for TS-forecasting and propose a new one-line plugin of dataset condensation designated as Dataset Condensation for Time Series Forecasting (CondTSF) based on our analysis. Plugging CondTSF into previous dataset condensation methods facilitates a reduction in the distance between the predictions of the model trained with the full dataset and the model trained with the synthetic dataset, thereby enhancing performance. We conduct extensive experiments on eight commonly used time series datasets. CondTSF consistently improves the performance of all previous dataset condensation methods across all datasets, particularly at low condensing ratios.

Dataset Condensation for Recommendation

TF-DCon: Leveraging Large Language Models (LLMs) to Empower Training-Free Dataset Condensation for Content-Based Recommendation

Dataset Condensation via Efficient Synthetic-Data Parameterization

Elucidating the Design Space of Dataset Condensation

Dataset Condensation with Distribution Matching

Dataset Condensation with Gradient Matching

CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting

User Interest Dynamics on Personalized Recommendation

ConsRec: Learning Consensus Behind Interactions for Group Recommendation

IntentGC: a Scalable Graph Convolution Framework Fusing Heterogeneous Information for Recommendation

Calibrated Dataset Condensation for Faster Hyperparameter Search

Multisize Dataset Condensation

You Only Condense Once: Two Rules for Pruning Condensed Datasets

AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets

Dataset Regeneration for Sequential Recommendation

Data Efficiency for Large Recommendation Models

Dataset Condensation for Time Series Classification via Dual Domain Matching

Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation

Interpolative Distillation for Unifying Biased and Debiased Recommendation

Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version