Abstract:Existing automatic prompt engineering methods are typically designed for discriminative tasks, where new task prompts are iteratively refined with limited feedback from a single metric reflecting a single aspect. However, these approaches are suboptimal for generative tasks, which require more nuanced guidance beyond a single numeric metric to improve the prompt and optimize multiple aspects of the generated text. To address these challenges, we propose a novel multi-aspect Critique-Suggestion-guided automatic Prompt Optimization (CriSPO) approach. CriSPO introduces a critique-suggestion module as its core component. This module spontaneously discovers aspects, and compares generated and reference texts across these aspects, providing specific suggestions for prompt modification. These clear critiques and actionable suggestions guide a receptive optimizer module to make more substantial changes, exploring a broader and more effective search space. To further improve CriSPO with multi-metric optimization, we introduce an Automatic Suffix Tuning (AST) extension to enhance the performance of task prompts across multiple metrics. We evaluate CriSPO on 4 state-of-the-art LLMs across 4 summarization and 5 QA datasets. Extensive experiments show 3-4\% ROUGE score improvement on summarization and substantial improvement of various metrics on QA.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the lack of methods for automatically optimizing prompts in text generation tasks. Existing automatic prompt engineering methods are mainly designed for discriminative tasks and iteratively optimize prompts through single - metric feedback. However, these methods are not effective in generation tasks because generation tasks require more detailed guidance rather than just a single numerical metric to improve prompts, and multiple aspects of the generated text need to be optimized. To address these challenges, the authors propose an automatic prompt optimization method named CriSPO guided by multi - aspect critical suggestions. CriSPO introduces a core component - the critical suggestion module, which can spontaneously discover aspects, compare the differences between the generated text and the reference text, and provide specific modification suggestions. These explicit criticisms and feasible suggestions guide an acceptability optimizer module, enabling it to make more substantial changes and explore a broader and more effective search space. In addition, to further improve the performance of CriSPO in multi - metric optimization, the authors also introduce the Automatic Suffix Tuning (AST) extension to enhance the performance of task prompts on multiple metrics. Specifically, the main contributions of the paper include: 1. Proposing CriSPO, an automatic prompt engineering technique specifically for generation tasks. It can discover multiple aspects for criticizing generated texts and write suggestions that are helpful for more effective prompt revision. 2. Conducting comprehensive experiments on multiple large - language models (LLMs) and datasets, demonstrating the effectiveness and robustness of this method. The experimental results show that CriSPO improves the ROUGE score by an average of 3 - 4% and also obtains significant improvements in various question - answering tasks. 3. Proposing AST, enabling prompts to be tuned for multiple metrics. Experiments prove that CriSPO combined with AST can simultaneously optimize AlignScore (used to evaluate fidelity) and ROUGE (used to evaluate reference similarity). Through these contributions, CriSPO provides a new solution for automatic prompt optimization in generation tasks, which not only improves task performance but also reduces the labor intensity of manual prompt engineering.

CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

Task Facet Learning: A Structured Approach to Prompt Optimization

MOPO: Multi-Objective Prompt Optimization for Affective Text Generation

AMPO: Automatic Multi-Branched Prompt Optimization

PGSO: Prompt-based Generative Sequence Optimization Network for Aspect-based Sentiment Analysis

SCULPT: Systematic Tuning of Long Prompts

StraGo: Harnessing Strategic Guidance for Prompt Optimization

Reviewer2: Optimizing Review Generation Through Prompt Generation

Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts

iPrOp: Interactive Prompt Optimization for Large Language Models with a Human in the Loop

Learning from Contrastive Prompts: Automated Optimization and Adaptation

PromptSum: Parameter-Efficient Controllable Abstractive Summarization

MPrompt: Exploring Multi-level Prompt Tuning for Machine Reading Comprehension

A Better LLM Evaluator for Text Generation: The Impact of Prompt Output Sequencing and Optimization

SentiPrompt: Sentiment Knowledge Enhanced Prompt-Tuning for Aspect-Based Sentiment Analysis

Automatic Prompt Optimization with "Gradient Descent" and Beam Search

GRAD-SUM: Leveraging Gradient Summarization for Optimal Prompt Engineering

GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning

Plug and Play with Prompts: A Prompt Tuning Approach for Controlling Text Generation