Abstract:Recent advances in large language models (LLMs) have demonstrated potential for LLM agents. To facilitate the training for these agents with both linguistic feedback and non-linguistic reward signals, we introduce Learning through Communication (LTC). We design a universal buffer to store all the feedback, and an iterative pipeline to enable an LLM agent to explore and update its policy in an given environment. To optimize agent interactions for task-specific learning with our universal buffer and pipeline, we introduce diverse communication patterns tailored for both single-agent and multi-agent environments. We evaluate the efficacy of our LTC approach on four diverse datasets: ALFWorld (single-agent), HotpotQA (multi-agent collaboration), Chameleon (multi-agent competition), and GSM8k (multi-agent teacher-student). On these data sets, LTC outperforms the supervised instruction fine-tuning baselines by 3.6% to 12%. These results highlight the versatility and efficiency of LTC in facilitating online adaptation for LLM agents.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively utilize language feedback and non - language reward signals during the training process of large language model (LLM) agents. Existing methods can usually only partially utilize these feedbacks, especially in scenarios such as multi - player role - playing games, where players generate a large amount of language data, and explicit reward signals such as victory or defeat at the end of the game are often only used as screening criteria rather than the goals of reinforcement learning. To bridge this gap, the author proposes a general framework, called Learning Through Communication (LTC), which aims to train LLM agents to use both language feedback and non - language reward signals simultaneously. Specifically, the paper proposes the following innovations: 1. **Learning Through Communication (LTC)**: A general framework is proposed for training LLM agents so that they can handle language feedback and non - language reward signals simultaneously. A general buffer is designed to store all feedbacks, and an iterative pipeline is designed to enable LLM agents to explore and update their strategies in a given environment. 2. **Task - specific communication patterns**: The LTC framework allows for the flexible design of communication patterns adapted to different tasks. The paper introduces three specific communication patterns: Single - agent Monologue, Multi - agent Dialogue, and Teacher - student Dialogue. These patterns can be used in combination to generate diverse structured interactions and feedback signals for agent training, which are suitable for various task types. 3. **Empirical research and findings**: Rigorous experiments were carried out on public benchmark tasks to prove the effectiveness of LTC. The experimental results show that LTC outperforms instruction - tuning or prompt - baseline methods on multiple benchmark tasks. Through the above innovations, the paper demonstrates the effectiveness and versatility of the LTC framework in promoting the online adaptation of LLM agents, especially in single - agent and multi - agent environments.

Adapting LLM Agents with Universal Feedback in Communication

CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models

Adaptive In-conversation Team Building for Language Model Agents

Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

AgentBench: Evaluating LLMs as Agents

Fine-grained LLM Agent: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback

Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework

Embodied LLM Agents Learn to Cooperate in Organized Teams

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

Training Agents with Weakly Supervised Feedback from Large Language Models

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study

AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents

Enabling Efficient Interaction between an Algorithm Agent and an LLM: A Reinforcement Learning Approach

Training Language Model Agents without Modifying Language Models

Policy Improvement using Language Feedback Models

Code-mixed LLM: Improve Large Language Models' Capability to Handle Code-Mixing through Reinforcement Learning from AI Feedback

Constructive Large Language Models Alignment with Diverse Feedback

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents