Abstract:Text-to-image diffusion model is a popular paradigm that synthesizes personalized images by providing a text prompt and a random Gaussian noise. While people observe that some noises are ``golden noises'' that can achieve better text-image alignment and higher human preference than others, we still lack a machine learning framework to obtain those golden noises. To learn golden noises for diffusion sampling, we mainly make three contributions in this paper. First, we identify a new concept termed the \textit{noise prompt}, which aims at turning a random Gaussian noise into a golden noise by adding a small desirable perturbation derived from the text prompt. Following the concept, we first formulate the \textit{noise prompt learning} framework that systematically learns ``prompted'' golden noise associated with a text prompt for diffusion models. Second, we design a noise prompt data collection pipeline and collect a large-scale \textit{noise prompt dataset}~(NPD) that contains 100k pairs of random noises and golden noises with the associated text prompts. With the prepared NPD as the training dataset, we trained a small \textit{noise prompt network}~(NPNet) that can directly learn to transform a random noise into a golden noise. The learned golden noise perturbation can be considered as a kind of prompt for noise, as it is rich in semantic information and tailored to the given text prompt. Third, our extensive experiments demonstrate the impressive effectiveness and generalization of NPNet on improving the quality of synthesized images across various diffusion models, including SDXL, DreamShaper-xl-v2-turbo, and Hunyuan-DiT. Moreover, NPNet is a small and efficient controller that acts as a plug-and-play module with very limited additional inference and computational costs, as it just provides a golden noise instead of a random noise without accessing the original pipeline.

In-Context Learning Unlocked for Diffusion Models

Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts

Prompt Diffusion Robustifies Any-Modality Prompt Learning

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

Prompt-Free Diffusion: Taking "text" out of Text-to-Image Diffusion Models

Contextualized Diffusion Models for Text-Guided Image and Video Generation

PromptFix: You Prompt and We Fix the Photo

Unleashing Text-to-Image Diffusion Models for Visual Perception

Context Diffusion: In-Context Aware Image Generation

Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing

Harnessing Diffusion Models for Visual Perception with Meta Prompts

Reverse Stable Diffusion: What prompt was used to generate this image?

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks

On Discrete Prompt Optimization for Diffusion Models

DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models

Implicit and Explicit Language Guidance for Diffusion-based Visual Perception

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Improving Diffusion-Based Image Synthesis with Context Prediction

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding

Golden Noise for Diffusion Models: A Learning Framework