Abstract:Prompt optimization aims to find the best prompt to a large language model (LLM) for a given task. LLMs have been successfully used to help find and improve prompt candidates for single-step tasks. However, realistic tasks for agents are multi-step and introduce new challenges: (1) Prompt content is likely to be more extensive and complex, making it more difficult for LLMs to analyze errors, (2) the impact of an individual step is difficult to evaluate, and (3) different people may have varied preferences about task execution. While humans struggle to optimize prompts, they are good at providing feedback about LLM outputs; we therefore introduce a new LLM-driven discrete prompt optimization framework PRompt Optimization in Multi-Step Tasks (PROMST) that incorporates human-designed feedback rules to automatically offer direct suggestions for improvement. We also use an extra learned heuristic model that predicts prompt performance to efficiently sample from prompt candidates. This approach significantly outperforms both human-engineered prompts and several other prompt optimization methods across 11 representative multi-step tasks (an average 10.6\%-29.3\% improvement to current best methods on five LLMs respectively). We believe our work can serve as a benchmark for automatic prompt optimization for LLM-driven multi-step tasks. Datasets and Codes are available at <a class="link-external link-https" href="https://github.com/yongchao98/PROMST" rel="external noopener nofollow">this https URL</a>. Project Page is available at <a class="link-external link-https" href="https://yongchao98.github.io/MIT-REALM-PROMST" rel="external noopener nofollow">this https URL</a>.

Robust Prompt Optimization for Large Language Models Against Distribution Shifts

MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization

Automatic Prompt Selection for Large Language Models

PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Are Large Language Models Good Prompt Optimizers?

Efficient Prompting Methods for Large Language Models: A Survey

PhaseEvo: Towards Unified In-Context Prompt Optimization for Large Language Models

Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers

SPRIG: Improving Large Language Model Performance by System Prompt Optimization

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Towards Goal-oriented Prompt Engineering for Large Language Models: A Survey

Large Language Models as Optimizers

Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts

Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models

Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement

QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling

GLaPE: Gold Label-agnostic Prompt Evaluation and Optimization for Large Language Model

Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

Learning from Contrastive Prompts: Automated Optimization and Adaptation