Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention

Susung Hong
2024-10-01
Abstract:Conditional diffusion models have shown remarkable success in visual content generation, producing high-quality samples across various domains, largely due to classifier-free guidance (CFG). Recent attempts to extend guidance to unconditional models have relied on heuristic techniques, resulting in suboptimal generation quality and unintended effects. In this work, we propose Smoothed Energy Guidance (SEG), a novel training- and condition-free approach that leverages the energy-based perspective of the self-attention mechanism to enhance image generation. By defining the energy of self-attention, we introduce a method to reduce the curvature of the energy landscape of attention and use the output as the unconditional prediction. Practically, we control the curvature of the energy landscape by adjusting the Gaussian kernel parameter while keeping the guidance scale parameter fixed. Additionally, we present a query blurring method that is equivalent to blurring the entire attention weights without incurring quadratic complexity in the number of tokens. In our experiments, SEG achieves a Pareto improvement in both quality and the reduction of side effects. The code is available at <a class="link-external link-https" href="https://github.com/SusungHong/SEG-SDXL" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of improving sample quality in unconditional image generation while reducing side effects. Specifically, the paper proposes a new method—Smoothed Energy Guidance (SEG), which aims to enhance the generation performance of diffusion models from the energy perspective of the self-attention mechanism. Compared to existing methods, SEG does not rely on specific training processes or external conditions and can continuously control the curvature of the energy surface by adjusting the Gaussian kernel parameters, thereby improving image quality without causing saturation or other side effects. Additionally, SEG introduces a query blurring technique that can achieve blurred processing of attention weights without increasing quadratic complexity. Experimental results show that SEG can significantly improve image quality and fidelity in both unconditional and conditional generation tasks, especially when compared to other methods such as SAG and PAG.