Adapting LLM Agents with Universal Feedback in Communication

Kuan Wang,Yadong Lu,Michael Santacroce,Yeyun Gong,Chao Zhang,Yelong Shen
2024-04-14
Abstract:Recent advances in large language models (LLMs) have demonstrated potential for LLM agents. To facilitate the training for these agents with both linguistic feedback and non-linguistic reward signals, we introduce Learning through Communication (LTC). We design a universal buffer to store all the feedback, and an iterative pipeline to enable an LLM agent to explore and update its policy in an given environment. To optimize agent interactions for task-specific learning with our universal buffer and pipeline, we introduce diverse communication patterns tailored for both single-agent and multi-agent environments. We evaluate the efficacy of our LTC approach on four diverse datasets: ALFWorld (single-agent), HotpotQA (multi-agent collaboration), Chameleon (multi-agent competition), and GSM8k (multi-agent teacher-student). On these data sets, LTC outperforms the supervised instruction fine-tuning baselines by 3.6% to 12%. These results highlight the versatility and efficiency of LTC in facilitating online adaptation for LLM agents.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively utilize language feedback and non - language reward signals during the training process of large language model (LLM) agents. Existing methods can usually only partially utilize these feedbacks, especially in scenarios such as multi - player role - playing games, where players generate a large amount of language data, and explicit reward signals such as victory or defeat at the end of the game are often only used as screening criteria rather than the goals of reinforcement learning. To bridge this gap, the author proposes a general framework, called Learning Through Communication (LTC), which aims to train LLM agents to use both language feedback and non - language reward signals simultaneously. Specifically, the paper proposes the following innovations: 1. **Learning Through Communication (LTC)**: A general framework is proposed for training LLM agents so that they can handle language feedback and non - language reward signals simultaneously. A general buffer is designed to store all feedbacks, and an iterative pipeline is designed to enable LLM agents to explore and update their strategies in a given environment. 2. **Task - specific communication patterns**: The LTC framework allows for the flexible design of communication patterns adapted to different tasks. The paper introduces three specific communication patterns: Single - agent Monologue, Multi - agent Dialogue, and Teacher - student Dialogue. These patterns can be used in combination to generate diverse structured interactions and feedback signals for agent training, which are suitable for various task types. 3. **Empirical research and findings**: Rigorous experiments were carried out on public benchmark tasks to prove the effectiveness of LTC. The experimental results show that LTC outperforms instruction - tuning or prompt - baseline methods on multiple benchmark tasks. Through the above innovations, the paper demonstrates the effectiveness and versatility of the LTC framework in promoting the online adaptation of LLM agents, especially in single - agent and multi - agent environments.