Model-Agnostic Human Preference Inversion in Diffusion Models

Jeeyung Kim,Ze Wang,Qiang Qiu
2024-04-01
Abstract:Efficient text-to-image generation remains a challenging task due to the high computational costs associated with the multi-step sampling in diffusion models. Although distillation of pre-trained diffusion models has been successful in reducing sampling steps, low-step image generation often falls short in terms of quality. In this study, we propose a novel sampling design to achieve high-quality one-step image generation aligning with human preferences, particularly focusing on exploring the impact of the prior noise distribution. Our approach, Prompt Adaptive Human Preference Inversion (PAHI), optimizes the noise distributions for each prompt based on human preferences without the need for fine-tuning diffusion models. Our experiments showcase that the tailored noise distributions significantly improve image quality with only a marginal increase in computational cost. Our findings underscore the importance of noise optimization and pave the way for efficient and high-quality text-to-image synthesis.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **the task of efficiently and high - quality one - step text - to - image generation**. Specifically, when Diffusion Models (DMs) generate text - to - image, the multi - step sampling process results in high computational costs. Although distilling pre - trained diffusion models can reduce the sampling steps, the image quality generated with a low number of steps is usually poor. To solve this problem, the authors propose a new sampling design, aiming to achieve high - quality one - step image generation by optimizing the noise distribution and making it conform to human preferences. In particular, they explore the influence of the prior noise distribution on the generated image quality and propose a method named **Prompt Adaptive Human Preference Inversion (PAHI)**, which optimizes the noise distribution for each prompt word according to human preferences without fine - tuning the diffusion model. ### Main contributions 1. **Optimizing the noise distribution**: Improve the image quality by optimizing the noise distribution, especially in the case of one - step generation. 2. **Prompt Adaptive Human Preference Inversion (PAHI)**: Introduce a lightweight noise prediction model that can generate customized noise distributions according to different text prompt words. 3. **Experimental verification**: Experiments show that the optimized noise distribution significantly improves the image quality with a relatively small increase in computational cost. ### Method overview - **Optimizing the noise distribution of all prompt words**: Optimize the parameters \(\mu\) and \(\sigma\) of the noise distribution by minimizing the objective function, thereby maximizing the quality score of the generated image. - **Adaptive noise distribution**: Further construct a conditional noise prediction model to predict customized noise distribution parameters \(\mu(c_i)\) and \(\sigma(c_i)\) according to the input text prompt words. - **Experimental verification**: By comparing with the standard Gaussian distribution, verify the superiority of the PAHI method in image quality and computational efficiency. ### Experimental results - **Comparison of human preference scores**: The images generated by the PAHI method show a higher winning rate on multiple scoring models, especially on PickScore and ImageReward. - **Computational cost analysis**: While maintaining high image quality, the PAHI method only needs to slightly increase the inference time, showing its efficient characteristics. In conclusion, this research significantly improves the quality of one - step text - to - image generation by optimizing the noise distribution and introducing an adaptive noise prediction model, providing new ideas and methods for the practical application of diffusion models.