Distributed Boosting: an Enhancing Method on Dataset Distillation
Xuechao Chen,Wenchao Meng,Peiran Wang,Qihang Zhou
DOI: https://doi.org/10.1145/3627673.3679897
2024-01-01
Abstract:Dataset Distillation (DD) is a technique for synthesizing smaller, compressed datasets from large original datasets while retaining essential information to maintain efficacy. Efficient DD is a current research focus among scholars. Squeeze, Recover and Relabel (SRe2L) and Adversarial Prediction Matching (APM) are two advanced and efficient DD methods, yet their performance is moderate with lower volumes of distilled data. This paper proposes an ingenious improvement method, Distributed Boosting (DB), capable of significantly enhancing the performance of these two algorithms at low distillation volumes, leading to DB-SRe2L and DB-APM. Specifically, DB is divided into three stages: Distribute & Encapsulate, Distill, and Integrate & Mix-relabel. DB-SRe2L, compared to SRe2L, demonstrates performance improvements of 25.2%, 26.9%, and 26.2% on full 224×224 ImageNet-1k at Images Per Class (IPC) 10, CIFAR-10 at IPC 10, and CIFAR-10 at IPC 50, respectively. Meanwhile, DB-APM, in comparison to APM, exhibits performance enhancements of 21.2% and 20.9% on CIFAR-10 at IPC 10, CIFAR-100 at IPC 1, respectively. Additionally, we provide a theoretical proof of convergence for DB. To the best of our knowledge, DB is the first method suitable for distributed parallel computing scenarios.