From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Jianliang He,Siyu Chen,Fengzhuo Zhang,Zhuoran Yang
2024-07-20
Abstract:In this work, from a theoretical lens, we aim to understand why large language model (LLM) empowered agents are able to solve decision-making problems in the physical world. To this end, consider a hierarchical reinforcement learning (RL) model where the LLM Planner and the Actor perform high-level task planning and low-level execution, respectively. Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting. Under proper assumptions on the pretraining data, we prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning. Additionally, we highlight the necessity for exploration beyond the subgoals derived from BAIL by proving that naively executing the subgoals returned by LLM leads to a linear regret. As a remedy, we introduce an $\epsilon$-greedy exploration strategy to BAIL, which is proven to incur sublinear regret when the pretraining error is small. Finally, we extend our theoretical framework to include scenarios where the LLM Planner serves as a world model for inferring the transition model of the environment and to multi-agent settings, enabling coordination among multiple Actors.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The paper aims to address the theoretical foundation of how large language models (LLMs) solve decision-making problems in the physical world. Specifically, the paper attempts to answer the following core questions: 1. **Theoretical Model**: How to construct a theoretical model to understand the performance of LLM agents? 2. **Decision Mechanism**: How do pre-trained LLMs solve decision-making problems in the physical world through prompting? 3. **Exploration and Exploitation**: How do LLM agents handle the trade-off between exploration and exploitation? 4. **Impact of Statistical Errors**: How do the statistical errors of pre-trained LLMs and Reporters affect the overall performance of LLM agents? To answer these questions, the paper proposes a theoretical framework based on Hierarchical Reinforcement Learning (HRL), where the LLM acts as the Planner, responsible for high-level task planning; the Actor is responsible for low-level specific action execution; and the Reporter is responsible for converting information from the physical environment into natural language feedback for the Planner. Through this framework, the paper explores the following points: - **Bayesian Aggregation Imitation Learning (BAIL)**: It is demonstrated that in the presence of pre-trained data containing expert trajectories, the LLM Planner performs Bayesian Aggregation Imitation Learning through In-Context Learning (ICL) during the prompting phase. - **Exploration Strategy**: An ϵ-greedy exploration strategy is introduced to overcome the linear regret caused by relying solely on sub-goals generated by BAIL, ensuring the effectiveness of learning. - **Performance Analysis**: The impact of the statistical errors of pre-trained LLMs and Reporters on overall performance is established, and performance guarantees under practical settings are provided. Additionally, the paper extends the theoretical framework to include the LLM Planner inferring the environment's transition model as a world model and addressing coordination issues in multi-agent settings. These studies provide an important theoretical foundation for understanding and optimizing the decision-making capabilities of LLM agents in the physical world.