PromptCoT: Align Prompt Distribution Via Adapted Chain-of-Thought
Junyi Yao,Yijiang Liu,Zhen Dong,Mingfei Guo,Helan Hu,Kurt Keutzer,Li Du,Daquan Zhou,Shanghang Zhang
DOI: https://doi.org/10.1109/cvpr52733.2024.00671
2024-01-01
Abstract:Diffusion-based generative models have exhibited remarkable capability in the production of high-fidelity visual content such as images and videos. However, their performance is significantly contingent upon the quality of textual inputs, commonly referred to as ‘'prompts'. The process of traditional prompt engineering necessitates empirical exper-tise and poses challenges for inexperienced users. In this paper, we introduce PromptCoT, an innovative enhancer that autonomously refines prompts for users. PromptCoT is designed based on the observation that prompts, which re-semble the textual information of high-quality images during training, lead to superior generation performance. Therefore, we fine-tune the Large Language Models (LLM) using a curated text dataset that comprises descriptions of high-quality visual content. Consequently, the LLM can capture the distribution of high-quality texts, enabling it to boost the original texts. Nonetheless, one drawback of LLMs is their tendency to generate irrelevant information. We employ a tailored Chain-of-Thought (CoT) mechanism to address the problem. Our CoT can extract and amalgamate crucial information from the prompt candidates, enabling a reasonable process based on the contextual cues to produce a more comprehensive and nuanced output. Considering computational efficiency, instead of allocating a dedicated LLM to each individual model or dataset, we integrate adapters that facil-itate task-specific adaptation, leveraging a shared LLM as the foundation for this process. With independent fine-tuning of adapters, we can adapt PromptCoT to new datasets while minimally increasing training costs and memory usage. We evaluate the effectiveness of PromptCoT by assessing on widely-used latent diffusion models for visual generation. The results demonstrate significant improvements in key performance metrics.