Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Hongjie Wang,Difan Liu,Yan Kang,Yijun Li,Zhe Lin,Niraj K. Jha,Yuchen Liu
2024-05-09
Abstract:Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable. To this end, we introduce the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework that leverages attention maps to perform run-time pruning of redundant tokens, without the need for any retraining. Specifically, for single-denoising-step pruning, we develop a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens, and a similarity-based recovery method to restore tokens for the convolution operation. In addition, we propose a Denoising-Steps-Aware Pruning (DSAP) approach to adjust the pruning budget across different denoising timesteps for better generation quality. Extensive evaluations show that AT-EDM performs favorably against prior art in terms of efficiency (e.g., 38.8% FLOPs saving and up to 1.53x speed-up over Stable Diffusion XL) while maintaining nearly the same FID and CLIP scores as the full model. Project webpage:
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning,Image and Video Processing,Signal Processing
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of high computational costs associated with Diffusion Models (DMs) when generating high-quality images. Although diffusion models excel in producing high-quality and diverse images, their architectural design, particularly the extensive use of attention modules, results in significant computational overhead. Existing methods to improve the efficiency of diffusion models primarily rely on retraining processes, which are not only computationally expensive but also lack scalability. To tackle this problem, the authors propose a framework called Attention-driven Training-free Efficient Diffusion Model (AT-EDM). This framework accelerates the inference process of diffusion models by pruning redundant attention tokens at runtime without any retraining. Specifically, AT-EDM includes the following key components: 1. **Token Pruning in a Single Denoising Step**: - Developed a new ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens. - Proposed a similarity-based recovery method to restore pruned tokens during convolution operations. 2. **Denoising-Steps-Aware Pruning (DSAP)**: - By analyzing the changes in attention maps across different denoising steps, it adjusts the pruning budget for different denoising time steps to improve generation quality. Through these methods, AT-EDM significantly enhances efficiency while maintaining almost the same FID and CLIP scores as the full model. For example, compared to the Stable Diffusion XL model, it reduces FLOPs by 38.8% and achieves up to 1.53 times speedup.