Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

Jing-Cheng Pang,Si-Hang Yang,Kaiyuan Li,Jiaji Zhang,Xiong-Hui Chen,Nan Tang,Yang Yu
2024-04-14
Abstract:Reinforcement learning (RL) trains agents to accomplish complex tasks through environmental interaction data, but its capacity is also limited by the scope of the available data. To obtain a knowledgeable agent, a promising approach is to leverage the knowledge from large language models (LLMs). Despite previous studies combining LLMs with RL, seamless integration of the two components remains challenging due to their semantic gap. This paper introduces a novel method, Knowledgeable Agents from Language Model Rollouts (KALM), which extracts knowledge from LLMs in the form of imaginary rollouts that can be easily learned by the agent through offline reinforcement learning methods. The primary challenge of KALM lies in LLM grounding, as LLMs are inherently limited to textual data, whereas environmental data often comprise numerical vectors unseen to LLMs. To address this, KALM fine-tunes the LLM to perform various tasks based on environmental data, including bidirectional translation between natural language descriptions of skills and their corresponding rollout data. This grounding process enhances the LLM's comprehension of environmental dynamics, enabling it to generate diverse and meaningful imaginary rollouts that reflect novel skills. Initial empirical evaluations on the CLEVR-Robot environment demonstrate that KALM enables agents to complete complex rephrasings of task goals and extend their capabilities to novel tasks requiring unprecedented optimal behaviors. KALM achieves a success rate of 46% in executing tasks with unseen goals, substantially surpassing the 26% success rate achieved by baseline methods. Furthermore, KALM effectively enables the LLM to comprehend environmental dynamics, resulting in the generation of meaningful imaginary rollouts that reflect novel skills and demonstrate the seamless integration of large language models and reinforcement learning.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to enhance the ability of agents in handling unseen tasks by combining large - language models (LLMs) with offline reinforcement learning (RL). Specifically, the paper proposes a method named "Knowledgeable Agents from Language Model Rollouts (KALM)", aiming to overcome the problem of insufficient generalization ability in traditional RL methods due to data set limitations. KALM achieves this goal through the following steps: 1. **LLM Environment Adaptation**: First, fine - tune the pre - trained LLM so that it can understand and handle states, actions, and dynamic changes in the environment. This step includes supervised learning of the LLM to enable it to complete four tasks: predicting environmental dynamics, explaining a given rollout sequence, generating a rollout sequence according to a specified goal, and predicting the final state reached after achieving a certain goal. 2. **Generate New Skill Rollouts**: Utilize the fine - tuned LLM to generate imaginary rollout sequences through specific prompts (such as "Generate rollouts for the following goal: [goal]"). These rollouts represent new skills or tasks. These new skills can be different expressions of existing tasks (i.e., language variants), or completely new unseen tasks. 3. **Acquire New Skills through Offline Reinforcement Learning**: Finally, use the offline RL method to train the policy network, which is based not only on the existing offline data set but also on the imaginary rollouts generated by the LLM. The purpose of this is to enable the agent to learn to perform new tasks or skills without directly interacting with the environment. The paper verifies the effectiveness of KALM through a series of experiments in the CLEVR - Robot environment. The experimental results show that KALM can significantly improve the success rate of agents in handling unseen tasks, especially in tasks that require combining multiple behaviors to complete. In addition, the study also found that training by combining offline data and rollouts generated by the LLM can not only improve the performance of agents on new tasks, but also maintain or even enhance their performance on original tasks.