Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Hongjie Wang,Difan Liu,Yan Kang,Yijun Li,Zhe Lin,Niraj K. Jha,Yuchen Liu

2024-05-09

Abstract:Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable. To this end, we introduce the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework that leverages attention maps to perform run-time pruning of redundant tokens, without the need for any retraining. Specifically, for single-denoising-step pruning, we develop a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens, and a similarity-based recovery method to restore tokens for the convolution operation. In addition, we propose a Denoising-Steps-Aware Pruning (DSAP) approach to adjust the pruning budget across different denoising timesteps for better generation quality. Extensive evaluations show that AT-EDM performs favorably against prior art in terms of efficiency (e.g., 38.8% FLOPs saving and up to 1.53x speed-up over Stable Diffusion XL) while maintaining nearly the same FID and CLIP scores as the full model. Project webpage:

Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning,Image and Video Processing,Signal Processing

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the issue of high computational costs associated with Diffusion Models (DMs) when generating high-quality images. Although diffusion models excel in producing high-quality and diverse images, their architectural design, particularly the extensive use of attention modules, results in significant computational overhead. Existing methods to improve the efficiency of diffusion models primarily rely on retraining processes, which are not only computationally expensive but also lack scalability. To tackle this problem, the authors propose a framework called Attention-driven Training-free Efficient Diffusion Model (AT-EDM). This framework accelerates the inference process of diffusion models by pruning redundant attention tokens at runtime without any retraining. Specifically, AT-EDM includes the following key components: 1. **Token Pruning in a Single Denoising Step**: - Developed a new ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens. - Proposed a similarity-based recovery method to restore pruned tokens during convolution operations. 2. **Denoising-Steps-Aware Pruning (DSAP)**: - By analyzing the changes in attention maps across different denoising steps, it adjusts the pruning budget for different denoising time steps to improve generation quality. Through these methods, AT-EDM significantly enhances efficiency while maintaining almost the same FID and CLIP scores as the full model. For example, compared to the Stable Diffusion XL model, it reduces FLOPs by 38.8% and achieves up to 1.53 times speedup.

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Multi-Step Denoising Scheduled Sampling: Towards Alleviating Exposure Bias for Diffusion Models

Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models

Diffusion Models Without Attention

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators

Denoising Diffusion Step-aware Models

Effortless Efficiency: Low-Cost Pruning of Diffusion Models

Stimulating Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling

Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models

Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon

Efficiency-optimized Video Diffusion Models

FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models

Analyzing and Improving the Training Dynamics of Diffusion Models

EDA-DM: Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models

DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization

AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation

Directly Denoising Diffusion Models