Abstract:TIPO (Text to Image with text pre-sampling for Prompt Optimization) is an innovative framework designed to enhance text-to-image (T2I) generation by language model (LM) for automatic prompt engineering. By refining and extending user-provided prompts, TIPO bridges the gap between simple inputs and the detailed prompts required for high-quality image generation. Unlike previous approaches that rely on Large Language Models (LLMs) or reinforcement learning (RL), TIPO adjusts user input prompts with the distribution of a trained prompt dataset, eliminating the need for complex runtime cost via lightweight model. This pre-sampling approach enables efficient and scalable prompt optimization, grounded in the model's training distribution. Experimental results demonstrate TIPO's effectiveness in improving aesthetic scores, reducing image corruption, and better aligning generated images with dataset distributions. These findings highlight the critical role of prompt engineering in T2I systems and open avenues for broader applications of automatic prompt refinement.

What problem does this paper attempt to address?

This paper attempts to address the problem of how to generate high-quality images in the Text-to-Image (T2I) generation task by optimizing the simple prompts provided by users. Specifically, the paper proposes a framework called TIPO (Text to Image with text pre-sampling for Prompt Optimization), which aims to optimize prompts through pre-sampling techniques to improve the quality of generated images, reduce image distortion, and better align the generated images with the dataset distribution. ### Main Issues: 1. **Bottleneck of High-Quality Image Generation**: Existing T2I generation models usually require detailed and highly descriptive prompts to generate high-quality images. However, such highly descriptive prompts are a high barrier for ordinary users, limiting the usability and accessibility of these models. 2. **Limitations of Manual Prompt Engineering**: Existing methods either rely on large language models (LLM) or reinforcement learning (RL) to expand prompts, which are either computationally expensive or difficult to generalize to different T2I models. 3. **Inconsistency in Prompt Optimization**: Existing methods often overlook the consistency between prompts and the training data distribution when optimizing prompts, resulting in generated images that may deviate from the user's original intent. ### Solution: TIPO addresses the above issues in the following ways: 1. **Pre-sampling Technique**: TIPO utilizes the distribution of the training dataset to optimize the simple prompts provided by users through pre-sampling techniques, without the need for complex runtime costs. 2. **Efficient and Scalable Prompt Optimization**: TIPO's method can efficiently optimize prompts while maintaining consistency with the training data distribution, thereby improving the quality of generated images. 3. **Wide Applicability**: The TIPO framework can be used with any T2I model, providing a general solution applicable to different architectures and versions. ### Experimental Results: Experimental results show that TIPO performs excellently in improving aesthetic scores, reducing image distortion, and better aligning generated images with the dataset distribution. These results highlight the critical role of prompt optimization in T2I technology and open new avenues for the widespread application of automatic prompt optimization.

TIPO: Text to Image with Text Presampling for Prompt Optimization

Improving Text-to-Image Consistency via Automatic Prompt Optimization

What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance

Optimizing Prompts for Text-to-Image Generation

Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Prompt Refinement with Image Pivot for Text-to-Image Generation

Dynamic Prompt Optimizing for Text-to-Image Generation

PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation

Universal Prompt Optimizer for Safe Text-to-Image Generation

Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Texts as Images in Prompt Tuning for Multi-Label Image Recognition

PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation

Optimizing Prompts Using In-Context Few-Shot Learning for Text-to-Image Generative Models

A Word is Worth a Thousand Pictures: Prompts as AI Design Material

Minority-Focused Text-to-Image Generation via Prompt Optimization

AK4Prompts: Aesthetics-driven Automatically Keywords-Ranking for Prompts in Text-To-Image Models

Prompt-Free Diffusion: Taking "text" out of Text-to-Image Diffusion Models

TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models