Accelerating Diffusion Models with One-to-Many Knowledge Distillation

Linfeng Zhang,Kaisheng Ma
2024-10-05
Abstract:Significant advancements in image generation have been made with diffusion models. Nevertheless, when contrasted with previous generative models, diffusion models face substantial computational overhead, leading to failure in real-time generation. Recent approaches have aimed to accelerate diffusion models by reducing the number of sampling steps through improved sampling techniques or step distillation. However, the methods to diminish the computational cost for each timestep remain a relatively unexplored area. Observing the fact that diffusion models exhibit varying input distributions and feature distributions at different timesteps, we introduce one-to-many knowledge distillation (O2MKD), which distills a single teacher diffusion model into multiple student diffusion models, where each student diffusion model is trained to learn the teacher's knowledge for a subset of continuous timesteps. Experiments on CIFAR10, LSUN Church, CelebA-HQ with DDPM and COCO30K with Stable Diffusion show that O2MKD can be applied to previous knowledge distillation and fast sampling methods to achieve significant acceleration. Codes will be released in Github.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily addresses the issue of high computational overhead in diffusion models for real-time image generation. Specifically: 1. **Problem Background**: Although diffusion models perform excellently in image generation, their high computational complexity during the iterative denoising process results in poor real-time generation performance, limiting their deployment in edge devices and interactive applications. 2. **Limitations of Existing Methods**: Current methods to accelerate diffusion models mainly focus on reducing the number of sampling steps, such as by improving sampling techniques or step distillation. However, there is little research on reducing the computational cost within each time step. 3. **Proposed New Method**: The paper introduces a "One-to-Many Knowledge Distillation" (O2MKD) method, which distills the knowledge of a teacher model into multiple student models, with each student model focusing on learning the teacher model's knowledge within a specific subset of time periods. This method reduces the learning difficulty for each student model by decomposing the task into multiple sub-tasks, thereby improving image generation quality. 4. **Experimental Validation**: Experimental results on multiple datasets (such as CIFAR10, LSUN Church, CelebA-HQ, and COCO30K) show that O2MKD can significantly accelerate the operation of diffusion models and outperform traditional knowledge distillation methods in terms of image fidelity. Additionally, O2MKD has the advantage of being compatible with other acceleration techniques, such as DDIM. In summary, the paper aims to address the low computational efficiency of diffusion models through the O2MKD method to achieve faster and higher-quality image generation.