Abstract:Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models. Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers. This approach enables higher-quality text-to-image generations and provides user control over stylistic features via constraint set specification. We demonstrate the utility of our framework by creating an interactive application for prompt enhancement and image generation using Stable Diffusion. Additionally, we conduct experiments utilizing a large dataset of human-engineered prompts for text-to-image generation and show that our approach automatically produces enhanced prompts that result in superior image quality. We make our code and a screencast video demo of NeuroPrompts publicly available.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper aims to address the issue of prompt design optimization in text-to-image generation models. Despite significant advancements in text-to-image diffusion models in recent years, generating high-quality images often requires human experts to perform prompt engineering. This not only raises the barrier for users but also limits the widespread applicability of these models. To this end, the paper proposes **NeuroPrompts**, an adaptive framework that can automatically enhance user-provided prompts to improve the quality of generated images. ### Specific Problems and Solutions 1. **Problems**: - **Prompt Sensitivity**: Text-to-image models are highly sensitive to prompts, and minor changes can lead to significant differences in the quality of generated images. - **User Skill Requirements**: Ordinary users lack the expertise in prompt optimization, making it difficult for them to generate high-quality images. - **Style Control**: Users wish to retain control over stylistic features when generating images. 2. **Solutions**: - **NeuroPrompts Framework**: By using a pre-trained language model (LM) and a reinforcement learning (PPO) algorithm, it automatically converts users' natural descriptions into optimized prompts to improve the quality of generated images. - **Constrained Decoding**: Utilizing the NeuroLogic decoding algorithm, it allows users to control attributes such as style, format, and artist by specifying a set of constraints. - **Interactive Application**: An interactive application was developed where users can input initial prompts and select different styles and attributes. The system will automatically optimize the prompts and generate images. ### Experimental Validation - **Dataset**: A large number of human-created prompts from the DiffusionDB dataset were used for supervised fine-tuning and reinforcement learning. - **Evaluation Metrics**: Aesthetics Score and PickScore were used to evaluate the quality of images generated from optimized prompts. - **Experimental Results**: Images generated from optimized prompts significantly outperformed those from non-optimized prompts in terms of aesthetics score, even surpassing prompts created by human experts. ### Conclusion NeuroPrompts lowers the barrier for users to utilize text-to-image generation models by automating prompt optimization, thereby improving the quality of generated images. This enables more users to easily generate high-quality images. Future work will further expand the application scope of NeuroPrompts, including video generation models and other scenarios requiring automated prompt optimization.

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Optimizing Prompts for Text-to-Image Generation

Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models

Dynamic Prompt Optimizing for Text-to-Image Generation

Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis

PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis

Best Prompts for Text-to-Image Models and How to Find Them

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis

Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation

PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation

PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation

Optimizing Prompts Using In-Context Few-Shot Learning for Text-to-Image Generative Models

Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Prompt-Free Diffusion: Taking "text" out of Text-to-Image Diffusion Models

Improving Text-to-Image Consistency via Automatic Prompt Optimization

Manipulating Embeddings of Stable Diffusion Prompts

RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions

ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation