Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine

Shayan Meshkat Alsadat,Jean-Raphael Gaglione,Daniel Neider,Ufuk Topcu,Zhe Xu
2024-02-11
Abstract:We present LARL-RM (Large language model-generated Automaton for Reinforcement Learning with Reward Machine) algorithm in order to encode high-level knowledge into reinforcement learning using automaton to expedite the reinforcement learning. Our method uses Large Language Models (LLM) to obtain high-level domain-specific knowledge using prompt engineering instead of providing the reinforcement learning algorithm directly with the high-level knowledge which requires an expert to encode the automaton. We use chain-of-thought and few-shot methods for prompt engineering and demonstrate that our method works using these approaches. Additionally, LARL-RM allows for fully closed-loop reinforcement learning without the need for an expert to guide and supervise the learning since LARL-RM can use the LLM directly to generate the required high-level knowledge for the task at hand. We also show the theoretical guarantee of our algorithm to converge to an optimal policy. We demonstrate that LARL-RM speeds up the convergence by 30% by implementing our method in two case studies.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem this paper attempts to address is how to utilize large language models (LLMs) to automatically generate deterministic finite automata (DFA) to accelerate the reinforcement learning (RL) process and guide it to reach the optimal policy more quickly. Specifically, the paper proposes a method called LARL-RM (Large Language Model Generated Automata for Reinforcement Learning with Reward Machines) to achieve this goal through the following means: 1. **Extracting Domain-Specific Knowledge**: Utilizing LLMs to obtain high-level domain knowledge through prompt engineering, rather than directly encoding this knowledge into the RL algorithm, which typically requires expert intervention. 2. **Automatic Adjustment and Updating**: LARL-RM allows for fully closed-loop reinforcement learning without the need for expert supervision, as it can directly utilize the high-level knowledge generated by LLMs. 3. **Theoretical Guarantee**: Demonstrates the theoretical guarantee of the LARL-RM algorithm converging to the optimal policy and shows through two case studies that this method can accelerate RL convergence by up to 30%. Through these means, the paper aims to address the issue of relying on expert knowledge to construct reward functions for complex tasks in traditional methods and to improve learning efficiency through automation.