Game On: Towards Language Models as RL Experimenters

Jingwei Zhang,Thomas Lampe,Abbas Abdolmaleki,Jost Tobias Springenberg,Martin Riedmiller
2024-09-05
Abstract:We propose an agent architecture that automates parts of the common reinforcement learning experiment workflow, to enable automated mastery of control domains for embodied agents. To do so, it leverages a VLM to perform some of the capabilities normally required of a human experimenter, including the monitoring and analysis of experiment progress, the proposition of new tasks based on past successes and failures of the agent, decomposing tasks into a sequence of subtasks (skills), and retrieval of the skill to execute - enabling our system to build automated curricula for learning. We believe this is one of the first proposals for a system that leverages a VLM throughout the full experiment cycle of reinforcement learning. We provide a first prototype of this system, and examine the feasibility of current models and techniques for the desired level of automation. For this, we use a standard Gemini model, without additional fine-tuning, to provide a curriculum of skills to a language-conditioned Actor-Critic algorithm, in order to steer data collection so as to aid learning new skills. Data collected in this way is shown to be useful for learning and iteratively improving control policies in a robotics domain. Additional examination of the ability of the system to build a growing library of skills, and to judge the progress of the training of those skills, also shows promising results, suggesting that the proposed architecture provides a potential recipe for fully automated mastery of tasks and domains for embodied agents.
Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the ability to automate most of the experimental processes in Reinforcement Learning (RL) experiments, especially for embodied agents. Specifically, the authors propose a system architecture that utilizes large - scale Vision - Language Models (VLMs) to achieve the following functions: 1. **Task Proposal**: Based on the known set of tasks, propose new tasks for agents to execute or learn. 2. **Task Decomposition**: Decompose high - level tasks into a series of low - level skills. 3. **Skill Retrieval**: Retrieve the most suitable skills from the existing skill library to complete specific subtasks. 4. **Training Progress Judgment**: Evaluate whether the training of a set of skills is completed and decide whether to start a new round of data collection for subsequent reinforcement learning. Through these functions, this system aims to reduce the need for human intervention during the RL experiment process, thereby achieving automated task and domain mastery. The paper also explores the applicability of currently available VLMs and their prompting techniques in achieving the above - mentioned automation level, and demonstrates its feasibility through a prototype system. This prototype system uses a standard Gemini model (without additional fine - tuning) to provide a curriculum for the Actor - Critic algorithm under language conditions to guide data collection and thus assist in the learning of new skills. Experimental results show that the data collected in this way helps to improve the learning performance of control strategies and can iteratively improve these strategies. In addition, the system can also build an ever - growing skill library and judge the training progress, showing good prospects.