CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation

Han He,Qianchu Liu,Lei Xu,Chaitanya Shivade,Yi Zhang,Sundararajan Srinivasan,Katrin Kirchhoff
2024-10-10
Abstract:Existing automatic prompt engineering methods are typically designed for discriminative tasks, where new task prompts are iteratively refined with limited feedback from a single metric reflecting a single aspect. However, these approaches are suboptimal for generative tasks, which require more nuanced guidance beyond a single numeric metric to improve the prompt and optimize multiple aspects of the generated text. To address these challenges, we propose a novel multi-aspect Critique-Suggestion-guided automatic Prompt Optimization (CriSPO) approach. CriSPO introduces a critique-suggestion module as its core component. This module spontaneously discovers aspects, and compares generated and reference texts across these aspects, providing specific suggestions for prompt modification. These clear critiques and actionable suggestions guide a receptive optimizer module to make more substantial changes, exploring a broader and more effective search space. To further improve CriSPO with multi-metric optimization, we introduce an Automatic Suffix Tuning (AST) extension to enhance the performance of task prompts across multiple metrics. We evaluate CriSPO on 4 state-of-the-art LLMs across 4 summarization and 5 QA datasets. Extensive experiments show 3-4\% ROUGE score improvement on summarization and substantial improvement of various metrics on QA.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of methods for automatically optimizing prompts in text generation tasks. Existing automatic prompt engineering methods are mainly designed for discriminative tasks and iteratively optimize prompts through single - metric feedback. However, these methods are not effective in generation tasks because generation tasks require more detailed guidance rather than just a single numerical metric to improve prompts, and multiple aspects of the generated text need to be optimized. To address these challenges, the authors propose an automatic prompt optimization method named CriSPO guided by multi - aspect critical suggestions. CriSPO introduces a core component - the critical suggestion module, which can spontaneously discover aspects, compare the differences between the generated text and the reference text, and provide specific modification suggestions. These explicit criticisms and feasible suggestions guide an acceptability optimizer module, enabling it to make more substantial changes and explore a broader and more effective search space. In addition, to further improve the performance of CriSPO in multi - metric optimization, the authors also introduce the Automatic Suffix Tuning (AST) extension to enhance the performance of task prompts on multiple metrics. Specifically, the main contributions of the paper include: 1. Proposing CriSPO, an automatic prompt engineering technique specifically for generation tasks. It can discover multiple aspects for criticizing generated texts and write suggestions that are helpful for more effective prompt revision. 2. Conducting comprehensive experiments on multiple large - language models (LLMs) and datasets, demonstrating the effectiveness and robustness of this method. The experimental results show that CriSPO improves the ROUGE score by an average of 3 - 4% and also obtains significant improvements in various question - answering tasks. 3. Proposing AST, enabling prompts to be tuned for multiple metrics. Experiments prove that CriSPO combined with AST can simultaneously optimize AlignScore (used to evaluate fidelity) and ROUGE (used to evaluate reference similarity). Through these contributions, CriSPO provides a new solution for automatic prompt optimization in generation tasks, which not only improves task performance but also reduces the labor intensity of manual prompt engineering.