Abstract:A broad use case of large language models (LLMs) is in goal-directed decision-making tasks (or "agent" tasks), where an LLM needs to not just generate completions for a given prompt, but rather make intelligent decisions over a multi-turn interaction to accomplish a task (e.g., when interacting with the web, using tools, or providing customer support). Reinforcement learning (RL) provides a general paradigm to address such agent tasks, but current RL methods for LLMs largely focus on optimizing single-turn rewards. By construction, most single-turn RL methods cannot endow LLMs with the ability to intelligently seek information over multiple turns, perform credit assignment, or reason about their past actions -- all of which are critical in agent tasks. This raises the question: how can we design effective and efficient multi-turn RL algorithms for LLMs? In this paper, we develop a framework for building multi-turn RL algorithms for fine-tuning LLMs, that preserves the flexibility of existing single-turn RL methods for LLMs (e.g., proximal policy optimization), while accommodating multiple turns, long horizons, and delayed rewards effectively. To do this, our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel: a high-level off-policy value-based RL algorithm to aggregate reward over utterances, and a low-level RL algorithm that utilizes this high-level value function to train a token policy within each utterance or turn. Our hierarchical framework, Actor-Critic Framework with a Hierarchical Structure (ArCHer), can also give rise to other RL methods. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks, attaining a sample efficiency of about 100x over existing methods, while also improving with larger model capacity (upto the 7 billion scale that we tested on).

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

Efficient Human-AI Coordination via Preparatory Language-based Convention

Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game

MindAgent: Emergent Gaming Interaction

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach

Building Cooperative Embodied Agents Modularly with Large Language Models

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game

Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

Embodied LLM Agents Learn to Cooperate in Organized Teams

A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration

LLM Agent Operating System

LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner

AIOS: LLM Agent Operating System

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Theory of Mind for Multi-Agent Collaboration via Large Language Models

HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model