Abstract:The primary axes of interest in image-generating diffusion models are image quality, the amount of variation in the results, and how well the results align with a given condition, e.g., a class label or a text prompt. The popular classifier-free guidance approach uses an unconditional model to guide a conditional model, leading to simultaneously better prompt alignment and higher-quality images at the cost of reduced variation. These effects seem inherently entangled, and thus hard to control. We make the surprising observation that it is possible to obtain disentangled control over image quality without compromising the amount of variation by guiding generation using a smaller, less-trained version of the model itself rather than an unconditional model. This leads to significant improvements in ImageNet generation, setting record FIDs of 1.01 for 64x64 and 1.25 for 512x512, using publicly available networks. Furthermore, the method is also applicable to unconditional diffusion models, drastically improving their quality.

What problem does this paper attempt to address?

This paper mainly discusses how to improve the quality of image generation and control variability in the image generation diffusion models. Although existing classifier-free guidance (CFG) methods can improve conditional alignment and image quality, they reduce variability and have some limitations, such as only applicable to conditional generation and may cause sampling trajectories to deviate from the desired distribution. The authors discovered an unexpected phenomenon that by using a smaller and undertrained version of the main model to guide the generation process, image quality can be controlled without sacrificing variability. They proposed a new method called "autoguidance", which uses a weaker version of the main model (e.g. limited model capacity or training time) as the guiding model instead of an unconditional model. This significantly improves image generation on ImageNet, achieving a FID score of 1.01 at 64×64 resolution and 1.25 at 512×512 resolution, setting new records. Furthermore, this method is also applicable to unconditional diffusion models, improving their quality. The paper analyzes why CFG can improve image quality and reveals the characteristic of models overemphasizing low-probability regions under limited capacity. Through autoguidance, the model can identify and reduce errors from the main model, thus enhancing the quality of generated images. Experiments show that autoguidance can work effectively as long as both models suffer compatible degradation. In conclusion, this paper addresses the challenges of existing diffusion models in generating high-quality and diverse images, and proposes a new guiding strategy called autoguidance, which improves image generation quality while maintaining variability.

Guiding a Diffusion Model with a Bad Version of Itself

Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

Elucidating The Design Space of Classifier-Guided Diffusion Generation

Diffusion Models Beat GANs on Image Synthesis

Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance

Self-Guided Diffusion Models

The Unreasonable Effectiveness of Guidance for Diffusion Models

Self-Guidance: Boosting Flow and Diffusion Generation on Their Own

Towards Practical Plug-and-Play Diffusion Models

Plug-and-Play Diffusion Distillation

Exploring Guided Sampling of Conditional GANs

End-to-End Diffusion Latent Optimization Improves Classifier Guidance

Diffusion Self-Guidance for Controllable Image Generation

Universal Guidance for Diffusion Models

Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models

Rectified Diffusion Guidance for Conditional Generation

Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion

Classifier-Free Diffusion Guidance

Elucidating the Design Space of Diffusion-Based Generative Models

Gradient-Free Classifier Guidance for Diffusion Model Sampling

Understanding and Improving Training-free Loss-based Diffusion Guidance