Reinforcement Learning Problem Solving with Large Language Models

Sina Gholamian,Domingo Huh
2024-04-29
Abstract:Large Language Models (LLMs) encapsulate an extensive amount of world knowledge, and this has enabled their application in various domains to improve the performance of a variety of Natural Language Processing (NLP) tasks. This has also facilitated a more accessible paradigm of conversation-based interactions between humans and AI systems to solve intended problems. However, one interesting avenue that shows untapped potential is the use of LLMs as Reinforcement Learning (RL) agents to enable conversational RL problem solving. Therefore, in this study, we explore the concept of formulating Markov Decision Process-based RL problems as LLM prompting tasks. We demonstrate how LLMs can be iteratively prompted to learn and optimize policies for specific RL tasks. In addition, we leverage the introduced prompting technique for episode simulation and Q-Learning, facilitated by LLMs. We then show the practicality of our approach through two detailed case studies for "Research Scientist" and "Legal Matter Intake" workflows.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
This paper discusses how to use large-scale language models (LLMs) as agents for reinforcement learning (RL) to solve Markov decision process (MDP) problems. Currently, LLMs have shown excellent performance in natural language processing tasks and are capable of conversing with humans. However, their potential as RL agents for problem-solving has not been fully explored. In this study, the authors propose an iterative prompting strategy that transforms RL problems into prompting tasks for LLMs. In this way, LLMs can gradually learn and optimize policies for specific RL tasks. Additionally, they utilize this prompting technique for simulation and Q-learning, enabling LLMs to participate in policy learning and obtain optimal policy results from LLMs. The paper demonstrates the practicality of this approach through two case studies: "research scientist" and "legal transaction processing" workflows. These cases show that LLMs can find optimal workflows within no more than two iterations. In summary, this paper attempts to address how to leverage the inherent knowledge and reasoning capabilities of LLMs to solve RL problems through iterative prompting, thus achieving a more intuitive and user-friendly interaction between AI systems and human users. This approach may have potentially transformative impacts on the optimization of RL problems.