Using Mixture of Experts to Accelerate Dataset Distillation

Zhi Xu,Zhenyong Fu
DOI: https://doi.org/10.1016/j.jvcir.2024.104137
IF: 2.887
2024-01-01
Journal of Visual Communication and Image Representation
Abstract:Recently, large datasets have become increasingly necessary for most deeplearning tasks, however, large datasets may bring some problems, such as diskstorage and huge computational expense. Dataset distillation is an emergingfield that aims to synthesize a small dataset from the original dataset, thena random model trained on the distillation dataset can achieve comparableperformances to the same architecture model trained on the original dataset.Matching Training Trajectories (MTT) achieves a leading performance in thisfield, but it needs to pre-train 200 expert models before the formal distillationprocess, which is called buffer process. In this paper, we propose a newmethod to reduce the consumed time of buffer process. Concretely, we useMixture of Experts (MoE) to train several expert models parallelly in bufferprocess. The experiments show our method can achieve a speedup of up toapproximately 4∼8× in buffer process with getting comparable distillationperformances.
What problem does this paper attempt to address?