Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine

Shayan Meshkat Alsadat,Jean-Raphael Gaglione,Daniel Neider,Ufuk Topcu,Zhe Xu

2024-02-11

Abstract:We present LARL-RM (Large language model-generated Automaton for Reinforcement Learning with Reward Machine) algorithm in order to encode high-level knowledge into reinforcement learning using automaton to expedite the reinforcement learning. Our method uses Large Language Models (LLM) to obtain high-level domain-specific knowledge using prompt engineering instead of providing the reinforcement learning algorithm directly with the high-level knowledge which requires an expert to encode the automaton. We use chain-of-thought and few-shot methods for prompt engineering and demonstrate that our method works using these approaches. Additionally, LARL-RM allows for fully closed-loop reinforcement learning without the need for an expert to guide and supervise the learning since LARL-RM can use the LLM directly to generate the required high-level knowledge for the task at hand. We also show the theoretical guarantee of our algorithm to converge to an optimal policy. We demonstrate that LARL-RM speeds up the convergence by 30% by implementing our method in two case studies.

Artificial Intelligence,Computation and Language

What problem does this paper attempt to address?

The problem this paper attempts to address is how to utilize large language models (LLMs) to automatically generate deterministic finite automata (DFA) to accelerate the reinforcement learning (RL) process and guide it to reach the optimal policy more quickly. Specifically, the paper proposes a method called LARL-RM (Large Language Model Generated Automata for Reinforcement Learning with Reward Machines) to achieve this goal through the following means: 1. **Extracting Domain-Specific Knowledge**: Utilizing LLMs to obtain high-level domain knowledge through prompt engineering, rather than directly encoding this knowledge into the RL algorithm, which typically requires expert intervention. 2. **Automatic Adjustment and Updating**: LARL-RM allows for fully closed-loop reinforcement learning without the need for expert supervision, as it can directly utilize the high-level knowledge generated by LLMs. 3. **Theoretical Guarantee**: Demonstrates the theoretical guarantee of the LARL-RM algorithm converging to the optimal policy and shows through two case studies that this method can accelerate RL convergence by up to 30%. Through these means, the paper aims to address the issue of relying on expert knowledge to construct reward functions for complex tasks in traditional methods and to improve learning efficiency through automation.

Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

Reinforcement Learning Problem Solving with Large Language Models

Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics

Learning Reward for Robot Skills Using Large Language Models via Self-Alignment

On the Modeling Capabilities of Large Language Models for Sequential Decision Making

ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents

Efficient Reinforcement Learning with Large Language Model Priors

Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

Large Language Models as General Pattern Machines

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning

Teaching Large Language Models to Reason with Reinforcement Learning

LARG, Language-based Automatic Reward and Goal Generation

Game On: Towards Language Models as RL Experimenters

MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint