Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

Zhendong Wang,Yifan Jiang,Huangjie Zheng,Peihao Wang,Pengcheng He,Zhangyang Wang,Weizhu Chen,Mingyuan Zhou
2023-10-19
Abstract:Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve outstanding FID scores in line with state-of-the-art benchmarks: 1.77 on CelebA-64$\times$64, 1.93 on AFHQv2-Wild-64$\times$64, and 2.72 on ImageNet-256$\times$256. We share our code and pre-trained models at <a class="link-external link-https" href="https://github.com/Zhendong-Wang/Patch-Diffusion" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The main problem this paper attempts to address is the long training time and large data requirements of diffusion models. Specifically, while diffusion models excel at generating high-quality images, their training process is very slow and requires a significant amount of time and data resources. This not only limits the widespread application of the models but also makes it difficult for many researchers to participate in this field due to a lack of sufficient computational resources. To tackle these issues, the paper proposes a new method called "Patch Diffusion," which is a diffusion model training framework based on image patches. This method significantly reduces the computational burden of each iteration by performing conditional score matching on small patches of the image, thereby speeding up the training process and improving data efficiency. Additionally, Patch Diffusion introduces new strategies such as randomization and diversification of patch sizes and pixel coordinate systems to balance training efficiency and the effectiveness of global structure encoding. Through these innovations, Patch Diffusion not only achieves faster training speeds (at least 2 times faster) compared to traditional methods but also trains diffusion models with better performance on small datasets. For example, when training from scratch on a dataset with only 5,000 images, Patch Diffusion can generate results that are significantly better than those of other methods. These improvements are of great significance for promoting the popularization and application of diffusion model technology.