Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment

Yiheng Li,Heyang Jiang,Akio Kodaira,Masayoshi Tomizuka,Kurt Keutzer,Chenfeng Xu
2024-10-31
Abstract:In this paper, we point out that suboptimal noise-data mapping leads to slow training of diffusion models. During diffusion training, current methods diffuse each image across the entire noise space, resulting in a mixture of all images at every point in the noise layer. We emphasize that this random mixture of noise-data mapping complicates the optimization of the denoising function in diffusion models. Drawing inspiration from the immiscibility phenomenon in physics, we propose Immiscible Diffusion, a simple and effective method to improve the random mixture of noise-data mapping. In physics, miscibility can vary according to various intermolecular forces. Thus, immiscibility means that the mixing of molecular sources is distinguishable. Inspired by this concept, we propose an assignment-then-diffusion training strategy to achieve Immiscible Diffusion. As one example, prior to diffusing the image data into noise, we assign diffusion target noise for the image data by minimizing the total image-noise pair distance in a mini-batch. The assignment functions analogously to external forces to expel the diffuse-able areas of images, thus mitigating the inherent difficulties in diffusion training. Our approach is remarkably simple, requiring only one line of code to restrict the diffuse-able area for each image while preserving the Gaussian distribution of noise. In this way, each image is preferably projected to nearby noise. Experiments demonstrate that our method can achieve up to 3x faster training for unconditional Consistency Models on the CIFAR dataset, as well as for DDIM and Stable Diffusion on CelebA and ImageNet dataset, and in class-conditional training and fine-tuning. In addition, we conducted a thorough analysis that sheds light on how it improves diffusion training speed while improving fidelity. The code is available at <a class="link-external link-https" href="https://yhli123.github.io/immiscible-diffusion" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve The paper aims to address the issue of slow training speed in diffusion models caused by suboptimal noise-data mapping during the training process. In traditional diffusion training, each image diffuses throughout the entire noise space, leading to each noise point potentially mapping to any source image. This random mixing of noise-data mapping complicates the optimization of the denoising function in diffusion models. The paper introduces the "Immiscible Diffusion" method to improve this random mixing of noise-data mapping, thereby enhancing training efficiency. ### Specific Problem Description 1. **Suboptimal Noise-Data Mapping**: - In traditional diffusion training, each image diffuses throughout the entire noise space, leading to each noise point potentially mapping to any source image. This random mixing of noise-data mapping complicates the optimization of the denoising function in diffusion models. - This random mixing of noise-data mapping results in slower convergence during the training process, increasing training time and resource consumption. 2. **Low Training Efficiency**: - Despite significant progress in image generation tasks using diffusion models, training these models remains very time-consuming and resource-intensive. - For example, on the CIFAR-10 dataset, the representative few-step diffusion model Consistency Model requires training for 10 days on 4 A6000 GPUs to achieve the desired FID score. ### Solution 1. **Immiscible Diffusion**: - Inspired by the phenomenon of immiscibility in physics, the paper proposes a new training strategy that allocates images and noise before diffusion to reduce the distance between image-noise pairs. - By minimizing the total distance of image-noise pairs in a batch, it ensures that each image only diffuses to a nearby region while maintaining the overall Gaussian distribution of noise. 2. **Simplified Implementation**: - The method is very simple, requiring only one line of code to implement, and does not modify the model architecture, noise scheduler, sampler, or inference method. - To reduce the complexity of the allocation algorithm, the paper adopts a quantization allocation strategy, quantizing noise and image data into low-precision formats (e.g., 16-bit), significantly reducing computational overhead. ### Experimental Results 1. **Improved Training Efficiency**: - Experimental results show that this method significantly improves training efficiency across multiple datasets and diffusion models. Specifically, on the CIFAR-10 dataset, the training efficiency of the immiscible unconditional Consistency Model increased by 3 times. - The quality of generated images also improved significantly, with lower FID scores and more complete and clear generated images. 2. **Generalization Ability**: - This method is not only applicable to the Consistency Model but also to other diffusion models such as DDIM and Stable Diffusion, and performs well in unconditional generation, conditional generation, and fine-tuning tasks. ### Summary By introducing the immiscible diffusion method, the paper addresses the issue of low training efficiency in diffusion models caused by suboptimal noise-data mapping during the training process. This method is simple and effective, requiring only one line of code to implement, and achieves significant improvements in training efficiency and generation quality across multiple datasets and diffusion models.