Abstract:We introduce a novel approach for prompt mixing, aiming to generate images at the intersection of multiple text prompts using pre-trained text-to-image diffusion models. At each time step during diffusion denoising, our algorithm forecasts predictions w.r.t. the generated image and makes informed text conditioning decisions. To do so, we leverage the connection between diffusion models (rooted in non-equilibrium thermodynamics) and the Black-Scholes model for pricing options in Finance, and draw analogies between the variables in both contexts to derive an appropriate algorithm for prompt mixing using the Black Scholes model. Specifically, the parallels between diffusion models and the Black-Scholes model enable us to leverage properties related to the dynamics of the Markovian model derived in the Black-Scholes algorithm. Our prompt-mixing algorithm is data-efficient, meaning it does not need additional training. Furthermore, it operates without human intervention or hyperparameter tuning. We highlight the benefits of our approach by comparing it qualitatively and quantitatively to other prompt mixing techniques, including linear interpolation, alternating prompts, step-wise prompt switching, and CLIP-guided prompt selection across various scenarios such as single object per text prompt, multiple objects per text prompt and objects against backgrounds. Code is available at

What problem does this paper attempt to address?

This paper attempts to address the problem of achieving more efficient prompt mixing in text-to-image diffusion models to generate images that simultaneously conform to multiple text prompts. Specifically, the authors propose a new method based on the Black-Scholes model, utilizing the Black-Scholes model from finance to dynamically select the most relevant text prompt at each denoising step, thereby generating a high-quality image that balances all given text prompt features. ### Main Contributions: 1. **Algorithm Design**: A new prompt mixing algorithm is proposed, which evaluates the "cost" associated with each relevant text prompt at each step of the diffusion denoising process and selects the optimal condition based on the cost that needs to be maximized. 2. **Theoretical Foundation**: The concept of diffusion models is linked with the Black-Scholes model from the financial domain, using the concept of Markovian time series to predict the score of each text prompt. 3. **Performance Validation**: Qualitative and quantitative experiments demonstrate the advantages of this method compared to several baseline methods (such as linear interpolation, alternating prompts, stepwise switching, etc.). ### Experimental Results: - The effectiveness of this method is validated through various experimental settings (single object, multiple objects, background actions, etc.). - Under different experimental settings, this method outperforms other methods in both CLIP-combined and CLIP-add scores. ### Future Work Directions: - Explore more advanced image-language models to improve evaluation metrics. - Study the effect of prompt mixing with more text prompts. - Extend this method to non-Gaussian diffusion models or one-step diffusion models. In summary, this paper aims to improve the performance of existing text-to-image diffusion models in the task of prompt mixing by introducing the Black-Scholes model, thereby generating high-quality images that better conform to multiple text descriptions.

Prompt Mixing in Diffusion Models using the Black Scholes Algorithm

On Discrete Prompt Optimization for Diffusion Models

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

Prompt-Free Diffusion: Taking "text" out of Text-to-Image Diffusion Models

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

Prompt Diffusion Robustifies Any-Modality Prompt Learning

Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis

Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

PromptFix: You Prompt and We Fix the Photo

Dynamic Prompt Optimizing for Text-to-Image Generation

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

MagicMix: Semantic Mixing with Diffusion Models

Reverse Stable Diffusion: What prompt was used to generate this image?

BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

In-Context Learning Unlocked for Diffusion Models

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models