StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for Large Language Models

Minchan Kwon,Gaeun Kim,Jongsuk Kim,Haeil Lee,Junmo Kim

2024-10-10

Abstract:Finding appropriate prompts for the specific task has become an important issue as the usage of Large Language Models (LLM) has expanded. Reinforcement Learning (RL) is widely used for prompt tuning, but its inherent instability and environmental dependency make it difficult to use in practice. In this paper, we propose StablePrompt, which strikes a balance between training stability and search space, mitigating the instability of RL and producing high-performance prompts. We formulate prompt tuning as an online RL problem between the agent and target LLM and introduce Adaptive Proximal Policy Optimization (APPO). APPO introduces an LLM anchor model to adaptively adjust the rate of policy updates. This allows for flexible prompt search while preserving the linguistic ability of the pre-trained LLM. StablePrompt outperforms previous methods on various tasks including text classification, question answering, and text generation. Our code can be found in github.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to automatically find or adjust the best prompt applicable to specific tasks when using large - language models (LLMs). Specifically, the paper focuses on optimizing these prompts through reinforcement learning (RL) methods to overcome the deficiencies of existing RL methods in training stability and their high dependence on the environment. These problems limit the wide application of RL in various LLMs and tasks. The authors propose a new method named StablePrompt, which aims to maintain training stability while ensuring the flexibility of the search space. By defining prompt tuning as an online, policy - based RL problem and introducing Adaptive Proximal Policy Optimization (APPO), StablePrompt can achieve high performance on different tasks (such as text classification, question answering, and text generation), and is applicable to agent models and target LLMs of different scales and types. In addition, the paper also proposes an extended version, TTE - StablePrompt, for generating input - related prompts to deal with tasks that are difficult to solve with a single prompt.

StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for Large Language Models

RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning

GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning

Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL

Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting

QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

Automatic Prompt Optimization with "Gradient Descent" and Beam Search

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling

Guiding Large Language Models via Directional Stimulus Prompting

RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents

Robustness-aware Automatic Prompt Optimization

Automatic Prompt Selection for Large Language Models

Effective Structured Prompting by Meta-Learning and Representative Verbalizer

PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine

Prompt Exploration with Prompt Regression

SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks

Are Large Language Models Good Prompt Optimizers?

Plug and Play with Prompts: A Prompt Tuning Approach for Controlling Text Generation

MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement Learning for Discrete Prompt Optimization

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs