SPRIG: Improving Large Language Model Performance by System Prompt Optimization

Lechen Zhang,Tolga Ergen,Lajanugen Logeswaran,Moontae Lee,David Jurgens
2024-10-25
Abstract:Large Language Models (LLMs) have shown impressive capabilities in many scenarios, but their performance depends, in part, on the choice of prompt. Past research has focused on optimizing prompts specific to a task. However, much less attention has been given to optimizing the general instructions included in a prompt, known as a system prompt. To address this gap, we propose SPRIG, an edit-based genetic algorithm that iteratively constructs prompts from prespecified components to maximize the model's performance in general scenarios. We evaluate the performance of system prompts on a collection of 47 different types of tasks to ensure generalizability. Our study finds that a single optimized system prompt performs on par with task prompts optimized for each individual task. Moreover, combining system and task-level optimizations leads to further improvement, which showcases their complementary nature. Experiments also reveal that the optimized system prompts generalize effectively across model families, parameter sizes, and languages. This study provides insights into the role of system-level instructions in maximizing LLM potential.
Computation and Language,Artificial Intelligence,Human-Computer Interaction,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is: how to improve the performance of large language models (LLMs) across various tasks by optimizing system prompts. Specifically, existing research mainly focuses on optimizing prompts for specific tasks, but pays less attention to optimizing system prompts. System prompts refer to a set of general instructions that precede any specific task details. The paper proposes a genetic algorithm-based method—SPRIG (System Prompt Refinement for Increased Generalization), which is used to iteratively construct and optimize system prompts to maximize the model's performance in general scenarios. The main contributions of the paper include: 1. Optimizing system prompts can produce performance improvements comparable to task-specific optimizations, even when these prompts contain general task instructions. 2. The optimization of system prompts and task prompts is complementary, and their combined use can further improve performance. 3. The optimized system prompts exhibit good generalization across different models, parameter scales, and languages. Through this research, the paper provides new insights into the role of system-level instructions in maximizing the potential of LLMs.