Game On: Towards Language Models as RL Experimenters

Jingwei Zhang,Thomas Lampe,Abbas Abdolmaleki,Jost Tobias Springenberg,Martin Riedmiller

2024-09-05

Abstract:We propose an agent architecture that automates parts of the common reinforcement learning experiment workflow, to enable automated mastery of control domains for embodied agents. To do so, it leverages a VLM to perform some of the capabilities normally required of a human experimenter, including the monitoring and analysis of experiment progress, the proposition of new tasks based on past successes and failures of the agent, decomposing tasks into a sequence of subtasks (skills), and retrieval of the skill to execute - enabling our system to build automated curricula for learning. We believe this is one of the first proposals for a system that leverages a VLM throughout the full experiment cycle of reinforcement learning. We provide a first prototype of this system, and examine the feasibility of current models and techniques for the desired level of automation. For this, we use a standard Gemini model, without additional fine-tuning, to provide a curriculum of skills to a language-conditioned Actor-Critic algorithm, in order to steer data collection so as to aid learning new skills. Data collected in this way is shown to be useful for learning and iteratively improving control policies in a robotics domain. Additional examination of the ability of the system to build a growing library of skills, and to judge the progress of the training of those skills, also shows promising results, suggesting that the proposed architecture provides a potential recipe for fully automated mastery of tasks and domains for embodied agents.

Artificial Intelligence,Robotics

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the ability to automate most of the experimental processes in Reinforcement Learning (RL) experiments, especially for embodied agents. Specifically, the authors propose a system architecture that utilizes large - scale Vision - Language Models (VLMs) to achieve the following functions: 1. **Task Proposal**: Based on the known set of tasks, propose new tasks for agents to execute or learn. 2. **Task Decomposition**: Decompose high - level tasks into a series of low - level skills. 3. **Skill Retrieval**: Retrieve the most suitable skills from the existing skill library to complete specific subtasks. 4. **Training Progress Judgment**: Evaluate whether the training of a set of skills is completed and decide whether to start a new round of data collection for subsequent reinforcement learning. Through these functions, this system aims to reduce the need for human intervention during the RL experiment process, thereby achieving automated task and domain mastery. The paper also explores the applicability of currently available VLMs and their prompting techniques in achieving the above - mentioned automation level, and demonstrates its feasibility through a prototype system. This prototype system uses a standard Gemini model (without additional fine - tuning) to provide a curriculum for the Actor - Critic algorithm under language conditions to guide data collection and thus assist in the learning of new skills. Experimental results show that the data collected in this way helps to improve the learning performance of control strategies and can iteratively improve these strategies. In addition, the system can also build an ever - growing skill library and judge the training progress, showing good prospects.

Game On: Towards Language Models as RL Experimenters

From Goal-Conditioned to Language-Conditioned Agents via Vision-Language Models

Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models

Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

Empowering Large Language Model Agents through Action Learning

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning

Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents

MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents

Mental Modeling of Reinforcement Learning Agents by Language Models

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

Language Models Meet World Models: Embodied Experiences Enhance Language Models

Embodied Executable Policy Learning with Language-based Scene Summarization

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models

Large Language Models as General Pattern Machines

Task Success is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors

MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation