Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting

Mohamed Salim Aissi,Clement Romac,Thomas Carta,Sylvain Lamprier,Pierre-Yves Oudeyer,Olivier Sigaud,Laure Soulier,Nicolas Thome
2024-10-29
Abstract:Reinforcement learning (RL) is a promising approach for aligning large language models (LLMs) knowledge with sequential decision-making tasks. However, few studies have thoroughly investigated the impact on LLM agents capabilities of fine-tuning them with RL in a specific environment. In this paper, we propose a novel framework to analyze the sensitivity of LLMs to prompt formulations following RL training in a textual environment. Our findings reveal that the performance of LLMs degrades when faced with prompt formulations different from those used during the RL training phase. Besides, we analyze the source of this sensitivity by examining the model's internal representations and salient tokens. Finally, we propose to use a contrastive loss to mitigate this sensitivity and improve the robustness and generalization capabilities of LLMs.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the sensitivity and over - fitting phenomenon of large language models (LLMs) to different prompts after fine - tuning through reinforcement learning (RL). Specifically, the research found that LLMs trained by RL in a specific environment will experience a significant performance decline when faced with prompt forms different from those during training. This phenomenon is called "prompt overfitting". The main objective of the paper is to analyze the causes of this overfitting and propose solutions to improve the generalization ability and robustness of LLMs. ### Main research questions: 1. **Prompt sensitivity**: How sensitive are LLMs to different prompt forms? How does this sensitivity affect their generalization ability in various prompt forms? 2. **State representation**: How do LLMs encode the state space in their hidden representations? What is the topological structure of these representations? 3. **The influence of prompt information on action selection**: After fine - tuning with multiple prompts, which parts of the prompts do LLMs focus on when completing tasks? ### Solutions: - **Contrastive learning loss**: To alleviate prompt overfitting, the paper proposes a contrastive learning loss, aiming to make the hidden representations of LLMs invariant to different prompt forms. In this way, zero - shot performance and robustness to prompt changes can be improved, and at the same time, the model's ability to acquire new knowledge in the environment can be enhanced. ### Experimental setup: - **Environment**: The experiments were carried out in two text environments: BabyAI - Text and TWC - Medium. - **Prompt design**: Four different prompt forms (P0, P1, P2, P3) were defined, each providing different combinations of goals, possible actions, inventories, and text observations. - **Training and evaluation**: Multiple LLMs (such as Flan - T5, GPT - Neo, etc.) were used for training, and the performance under different prompt forms was evaluated during training and testing respectively. ### Main findings: - **Prompt sensitivity**: LLMs without fine - tuning have poor performance in zero - shot scenarios, while LLMs fine - tuned with a single prompt form perform well under the same prompt form but have a significant performance decline under other prompt forms. - **State representation**: LLMs tend to cluster prompts according to prompt forms rather than the content itself, which further confirms the prompt overfitting phenomenon. - **The importance of prompt information**: LLMs focus on different parts of the prompts under different prompt forms, which is related to the performance changes. ### Conclusion: Through detailed experiments and analysis, the paper reveals the sensitivity and over - fitting phenomenon of LLMs to prompt forms after RL fine - tuning, and proposes a contrastive learning loss to alleviate this problem. This not only improves the generalization ability of LLMs under different prompt forms but also enhances the robustness and adaptability of the model in the interactive environment.