Abstract:Efficient text-to-image generation remains a challenging task due to the high computational costs associated with the multi-step sampling in diffusion models. Although distillation of pre-trained diffusion models has been successful in reducing sampling steps, low-step image generation often falls short in terms of quality. In this study, we propose a novel sampling design to achieve high-quality one-step image generation aligning with human preferences, particularly focusing on exploring the impact of the prior noise distribution. Our approach, Prompt Adaptive Human Preference Inversion (PAHI), optimizes the noise distributions for each prompt based on human preferences without the need for fine-tuning diffusion models. Our experiments showcase that the tailored noise distributions significantly improve image quality with only a marginal increase in computational cost. Our findings underscore the importance of noise optimization and pave the way for efficient and high-quality text-to-image synthesis.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **the task of efficiently and high - quality one - step text - to - image generation**. Specifically, when Diffusion Models (DMs) generate text - to - image, the multi - step sampling process results in high computational costs. Although distilling pre - trained diffusion models can reduce the sampling steps, the image quality generated with a low number of steps is usually poor. To solve this problem, the authors propose a new sampling design, aiming to achieve high - quality one - step image generation by optimizing the noise distribution and making it conform to human preferences. In particular, they explore the influence of the prior noise distribution on the generated image quality and propose a method named **Prompt Adaptive Human Preference Inversion (PAHI)**, which optimizes the noise distribution for each prompt word according to human preferences without fine - tuning the diffusion model. ### Main contributions 1. **Optimizing the noise distribution**: Improve the image quality by optimizing the noise distribution, especially in the case of one - step generation. 2. **Prompt Adaptive Human Preference Inversion (PAHI)**: Introduce a lightweight noise prediction model that can generate customized noise distributions according to different text prompt words. 3. **Experimental verification**: Experiments show that the optimized noise distribution significantly improves the image quality with a relatively small increase in computational cost. ### Method overview - **Optimizing the noise distribution of all prompt words**: Optimize the parameters \(\mu\) and \(\sigma\) of the noise distribution by minimizing the objective function, thereby maximizing the quality score of the generated image. - **Adaptive noise distribution**: Further construct a conditional noise prediction model to predict customized noise distribution parameters \(\mu(c_i)\) and \(\sigma(c_i)\) according to the input text prompt words. - **Experimental verification**: By comparing with the standard Gaussian distribution, verify the superiority of the PAHI method in image quality and computational efficiency. ### Experimental results - **Comparison of human preference scores**: The images generated by the PAHI method show a higher winning rate on multiple scoring models, especially on PickScore and ImageReward. - **Computational cost analysis**: While maintaining high image quality, the PAHI method only needs to slightly increase the inference time, showing its efficient characteristics. In conclusion, this research significantly improves the quality of one - step text - to - image generation by optimizing the noise distribution and introducing an adaptive noise prediction model, providing new ideas and methods for the practical application of diffusion models.

Model-Agnostic Human Preference Inversion in Diffusion Models

BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation

Not All Noises Are Created Equally:Diffusion Noise Selection and Optimization

Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis

Golden Noise for Diffusion Models: A Learning Framework

Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models

Prompt-Free Diffusion: Taking "text" out of Text-to-Image Diffusion Models

FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

Fine Tuning Text-to-Image Diffusion Models for Correcting Anomalous Images

Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models

Training-free Diffusion Model Alignment with Sampling Demons

Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models

Saliency Guided Optimization of Diffusion Latents

Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis

Observation-Guided Diffusion Probabilistic Models

An Improved Method for Personalizing Diffusion Models