Abstract:Diffusion-based generative models have exhibited remarkable capability in the production of high-fidelity visual content such as images and videos. However, their performance is significantly contingent upon the quality of textual inputs, commonly referred to as ‘'prompts'. The process of traditional prompt engineering necessitates empirical exper-tise and poses challenges for inexperienced users. In this paper, we introduce PromptCoT, an innovative enhancer that autonomously refines prompts for users. PromptCoT is designed based on the observation that prompts, which re-semble the textual information of high-quality images during training, lead to superior generation performance. Therefore, we fine-tune the Large Language Models (LLM) using a curated text dataset that comprises descriptions of high-quality visual content. Consequently, the LLM can capture the distribution of high-quality texts, enabling it to boost the original texts. Nonetheless, one drawback of LLMs is their tendency to generate irrelevant information. We employ a tailored Chain-of-Thought (CoT) mechanism to address the problem. Our CoT can extract and amalgamate crucial information from the prompt candidates, enabling a reasonable process based on the contextual cues to produce a more comprehensive and nuanced output. Considering computational efficiency, instead of allocating a dedicated LLM to each individual model or dataset, we integrate adapters that facil-itate task-specific adaptation, leveraging a shared LLM as the foundation for this process. With independent fine-tuning of adapters, we can adapt PromptCoT to new datasets while minimally increasing training costs and memory usage. We evaluate the effectiveness of PromptCoT by assessing on widely-used latent diffusion models for visual generation. The results demonstrate significant improvements in key performance metrics.

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis

Optimizing Prompts for Text-to-Image Generation

Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models

PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement

Dynamic Prompt Optimizing for Text-to-Image Generation

Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Optimizing Prompts Using In-Context Few-Shot Learning for Text-to-Image Generative Models

What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance

BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation

Best Prompts for Text-to-Image Models and How to Find Them

AK4Prompts: Aesthetics-driven Automatically Keywords-Ranking for Prompts in Text-To-Image Models

User-Friendly Customized Generation with Multi-Modal Prompts

Capability-aware Prompt Reformulation Learning for Text-to-Image Generation

RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions

Improving Text-to-Image Consistency via Automatic Prompt Optimization

Prompt-Free Diffusion: Taking "text" out of Text-to-Image Diffusion Models

PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation

PromptCoT: Align Prompt Distribution Via Adapted Chain-of-Thought