Abstract:Reinforcement learning (RL) trains agents to accomplish complex tasks through environmental interaction data, but its capacity is also limited by the scope of the available data. To obtain a knowledgeable agent, a promising approach is to leverage the knowledge from large language models (LLMs). Despite previous studies combining LLMs with RL, seamless integration of the two components remains challenging due to their semantic gap. This paper introduces a novel method, Knowledgeable Agents from Language Model Rollouts (KALM), which extracts knowledge from LLMs in the form of imaginary rollouts that can be easily learned by the agent through offline reinforcement learning methods. The primary challenge of KALM lies in LLM grounding, as LLMs are inherently limited to textual data, whereas environmental data often comprise numerical vectors unseen to LLMs. To address this, KALM fine-tunes the LLM to perform various tasks based on environmental data, including bidirectional translation between natural language descriptions of skills and their corresponding rollout data. This grounding process enhances the LLM's comprehension of environmental dynamics, enabling it to generate diverse and meaningful imaginary rollouts that reflect novel skills. Initial empirical evaluations on the CLEVR-Robot environment demonstrate that KALM enables agents to complete complex rephrasings of task goals and extend their capabilities to novel tasks requiring unprecedented optimal behaviors. KALM achieves a success rate of 46% in executing tasks with unseen goals, substantially surpassing the 26% success rate achieved by baseline methods. Furthermore, KALM effectively enables the LLM to comprehend environmental dynamics, resulting in the generation of meaningful imaginary rollouts that reflect novel skills and demonstrate the seamless integration of large language models and reinforcement learning.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to enhance the ability of agents in handling unseen tasks by combining large - language models (LLMs) with offline reinforcement learning (RL). Specifically, the paper proposes a method named "Knowledgeable Agents from Language Model Rollouts (KALM)", aiming to overcome the problem of insufficient generalization ability in traditional RL methods due to data set limitations. KALM achieves this goal through the following steps: 1. **LLM Environment Adaptation**: First, fine - tune the pre - trained LLM so that it can understand and handle states, actions, and dynamic changes in the environment. This step includes supervised learning of the LLM to enable it to complete four tasks: predicting environmental dynamics, explaining a given rollout sequence, generating a rollout sequence according to a specified goal, and predicting the final state reached after achieving a certain goal. 2. **Generate New Skill Rollouts**: Utilize the fine - tuned LLM to generate imaginary rollout sequences through specific prompts (such as "Generate rollouts for the following goal: [goal]"). These rollouts represent new skills or tasks. These new skills can be different expressions of existing tasks (i.e., language variants), or completely new unseen tasks. 3. **Acquire New Skills through Offline Reinforcement Learning**: Finally, use the offline RL method to train the policy network, which is based not only on the existing offline data set but also on the imaginary rollouts generated by the LLM. The purpose of this is to enable the agent to learn to perform new tasks or skills without directly interacting with the environment. The paper verifies the effectiveness of KALM through a series of experiments in the CLEVR - Robot environment. The experimental results show that KALM can significantly improve the success rate of agents in handling unseen tasks, especially in tasks that require combining multiple behaviors to complete. In addition, the study also found that training by combining offline data and rollouts generated by the LLM can not only improve the performance of agents on new tasks, but also maintain or even enhance their performance on original tasks.

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning

Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models

SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling

Mental Modeling of Reinforcement Learning Agents by Language Models

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

Empowering Large Language Model Agents through Action Learning

Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine

Offline RL for Natural Language Generation with Implicit Language Q Learning

LLM Augmented Hierarchical Agents

RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Reinforcement Learning Problem Solving with Large Language Models

On the Modeling Capabilities of Large Language Models for Sequential Decision Making

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study