Abstract:In this work, from a theoretical lens, we aim to understand why large language model (LLM) empowered agents are able to solve decision-making problems in the physical world. To this end, consider a hierarchical reinforcement learning (RL) model where the LLM Planner and the Actor perform high-level task planning and low-level execution, respectively. Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting. Under proper assumptions on the pretraining data, we prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning. Additionally, we highlight the necessity for exploration beyond the subgoals derived from BAIL by proving that naively executing the subgoals returned by LLM leads to a linear regret. As a remedy, we introduce an $\epsilon$-greedy exploration strategy to BAIL, which is proven to incur sublinear regret when the pretraining error is small. Finally, we extend our theoretical framework to include scenarios where the LLM Planner serves as a world model for inferring the transition model of the environment and to multi-agent settings, enabling coordination among multiple Actors.

What problem does this paper attempt to address?

The paper aims to address the theoretical foundation of how large language models (LLMs) solve decision-making problems in the physical world. Specifically, the paper attempts to answer the following core questions: 1. **Theoretical Model**: How to construct a theoretical model to understand the performance of LLM agents? 2. **Decision Mechanism**: How do pre-trained LLMs solve decision-making problems in the physical world through prompting? 3. **Exploration and Exploitation**: How do LLM agents handle the trade-off between exploration and exploitation? 4. **Impact of Statistical Errors**: How do the statistical errors of pre-trained LLMs and Reporters affect the overall performance of LLM agents? To answer these questions, the paper proposes a theoretical framework based on Hierarchical Reinforcement Learning (HRL), where the LLM acts as the Planner, responsible for high-level task planning; the Actor is responsible for low-level specific action execution; and the Reporter is responsible for converting information from the physical environment into natural language feedback for the Planner. Through this framework, the paper explores the following points: - **Bayesian Aggregation Imitation Learning (BAIL)**: It is demonstrated that in the presence of pre-trained data containing expert trajectories, the LLM Planner performs Bayesian Aggregation Imitation Learning through In-Context Learning (ICL) during the prompting phase. - **Exploration Strategy**: An ϵ-greedy exploration strategy is introduced to overcome the linear regret caused by relying solely on sub-goals generated by BAIL, ensuring the effectiveness of learning. - **Performance Analysis**: The impact of the statistical errors of pre-trained LLMs and Reporters on overall performance is established, and performance guarantees under practical settings are provided. Additionally, the paper extends the theoretical framework to include the LLM Planner inferring the environment's transition model as a world model and addressing coordination issues in multi-agent settings. These studies provide an important theoretical foundation for understanding and optimizing the decision-making capabilities of LLM agents in the physical world.

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

LLM A: Human in the Loop Large Language Models Enabled A Search for Robotics

Empowering Large Language Model Agents through Action Learning

Inner Monologue: Embodied Reasoning through Planning with Language Models

LLM-SAP: Large Language Models Situational Awareness Based Planning

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents

Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

LLMSat: A Large Language Model-Based Goal-Oriented Agent for Autonomous Space Exploration

Embodied LLM Agents Learn to Cooperate in Organized Teams

Theory of Mind for Multi-Agent Collaboration via Large Language Models

LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs

From Laws to Motivation: Guiding Exploration through Law-Based Reasoning and Rewards