Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations

Yu-Hui Chen,Raman Sarokin,Juhyun Lee,Jiuqiang Tang,Chuo-Ling Chang,Andrei Kulik,Matthias Grundmann
2023-06-17
Abstract:The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date (under 12 seconds for Stable Diffusion 1.4 without int8 quantization on Samsung S23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobile devices. These enhancements broaden the applicability of generative AI and improve the overall user experience across a wide range of devices.
Computer Vision and Pattern Recognition,Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to efficiently deploy large - scale diffusion models (such as Stable Diffusion) on mobile devices to achieve fast inference speed while overcoming the challenges of limited device computing and memory resources. Specifically, the paper points out: 1. **Existing problems**: - Large - scale diffusion models (such as Stable Diffusion 1.4) have more than 1 billion parameters, resulting in significant computing and memory limitations when deployed on mobile devices. - Running these models directly on mobile devices will lead to high latency, especially during the iterative denoising process, which will consume a large amount of memory and increase processing time. - Although there have been attempts to deploy Stable Diffusion on mobile devices, these efforts are usually limited to specific devices or chipsets, and there is still room for improvement in terms of inference latency. 2. **Objectives**: - Provide a series of optimization techniques to accelerate the inference speed of large - scale diffusion models on mobile devices equipped with GPUs. - Achieve the fastest reported inference latency (for example, the latency of generating 512×512 images on Samsung S23 Ultra is less than 12 seconds without using INT8 quantization). - Expand the application range of generative AI and enhance the user experience on various devices. Through these optimizations, the paper aims to enable large - scale diffusion models to run efficiently on a wider range of mobile devices, thereby reducing server costs, providing offline functionality and enhancing user privacy.