Abstract:In sequential decision-making (SDM) tasks, methods like reinforcement learning (RL) and heuristic search have made notable advances in specific cases. However, they often require extensive exploration and face challenges in generalizing across diverse environments due to their limited grasp of the underlying decision dynamics. In contrast, large language models (LLMs) have recently emerged as powerful general-purpose tools, due to their capacity to maintain vast amounts of domain-specific knowledge. To harness this rich prior knowledge for efficiently solving complex SDM tasks, we propose treating LLMs as prior action distributions and integrating them into RL frameworks through Bayesian inference methods, making use of variational inference and direct posterior sampling. The proposed approaches facilitate the seamless incorporation of fixed LLM priors into both policy-based and value-based RL frameworks. Our experiments show that incorporating LLM-based action priors significantly reduces exploration and optimization complexity, substantially improving sample efficiency compared to traditional RL techniques, e.g., using LLM priors decreases the number of required samples by over 90% in offline learning scenarios.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use large language models (LLMs) as prior knowledge in sequential decision - making tasks (SDM) to improve the sample efficiency and generalization ability of reinforcement learning (RL) algorithms. Specifically, the paper proposes a method, which is to use LLMs as the prior distribution of actions and integrate them into the RL framework through Bayesian inference methods, in order to reduce the exploration space and optimize the complexity, thereby significantly improving the sample efficiency. This method is particularly effective in handling complex SDM tasks and can greatly reduce the required sample size. For example, in the offline learning scenario, using the LLM prior can reduce the required sample size by more than 90%. The main contributions of the paper include: 1. Proposing a unified framework for integrating large language models as probability priors into the Markov decision processes (MDPs) framework. 2. Actually implementing this framework by using LLMs as an improved action sampler in value - oriented online RL, a behavior regulator in offline RL, or a KL loss term in policy - oriented RL. 3. Through extensive experiments on major benchmarks such as ALFWorld and Overcooked, it is proved that the new framework has a significant improvement in sample efficiency compared with pure RL and pure LLM baseline methods, and brings a more robust and generalized value function. The paper discusses in detail how to use LLMs to solve complex sequential decision - making tasks from the perspective of Bayesian inference, including regarding LLMs as the prior distribution of actions and approximately solving the posterior action distribution that meets the task objectives through probability inference methods such as variational inference and direct posterior sampling. These methods not only improve the sample efficiency of the algorithm but also enhance its generalization ability in different environments.

Efficient Reinforcement Learning with Large Language Model Priors

On the Modeling Capabilities of Large Language Models for Sequential Decision Making

Efficient Sequential Decision Making with Large Language Models

Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models

Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

Reinforcement Learning Problem Solving with Large Language Models

Guiding Pretraining in Reinforcement Learning with Large Language Models

LLM4RL: Enhancing Reinforcement Learning with Large Language Models

Enhancing Q-Learning with Large Language Model Heuristics

Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning

Introspective Tips: Large Language Model for In-Context Decision Making

Teaching Large Language Models to Reason with Reinforcement Learning

Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine

Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning

Mental Modeling of Reinforcement Learning Agents by Language Models

Large Language Models are Learnable Planners for Long-Term Recommendation

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

DeLLMa: Decision Making Under Uncertainty with Large Language Models

World Models with Hints of Large Language Models for Goal Achieving

Reinforcement Learning-based Recommender Systems with Large Language Models for State Reward and Action Modeling

Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning