Abstract:One of the final frontiers in the development of complex human - AI collaborative systems is the ability of AI agents to comprehend the natural language and perform tasks accordingly. However, training efficient Reinforcement Learning (RL) agents grounded in natural language has been a long-standing challenge due to the complexity and ambiguity of the language and sparsity of the rewards, among other factors. Several advances in reinforcement learning, curriculum learning, continual learning, language models have independently contributed to effective training of grounded agents in various environments. Leveraging these developments, we present a novel algorithm, Grounded Language Instruction through DEmonstration in RL (GLIDE-RL) that introduces a teacher-instructor-student curriculum learning framework for training an RL agent capable of following natural language instructions that can generalize to previously unseen language instructions. In this multi-agent framework, the teacher and the student agents learn simultaneously based on the student's current skill level. We further demonstrate the necessity for training the student agent with not just one, but multiple teacher agents. Experiments on a complex sparse reward environment validates the effectiveness of our proposed approach.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve The paper aims to address the problem of how to enable Reinforcement Learning (RL) agents to understand natural language and execute tasks based on instructions in complex AI-human collaboration systems. Specifically, the paper focuses on the challenges of training RL agents that can understand and execute natural language instructions in sparse reward environments. ### Main Challenges 1. **Complexity and Ambiguity of Natural Language**: Natural language has a high degree of complexity and ambiguity, making it difficult for RL agents to accurately understand instructions. 2. **Sparse Reward Problem**: In many tasks, agents only receive rewards after achieving specific goals, and this sparse reward mechanism makes the learning process very difficult. 3. **Generalization Ability**: Agents need to handle previously unseen language instructions and effectively execute tasks in new environments. ### Solution To address these challenges, the paper proposes a new algorithm called GLIDE-RL, which trains RL agents by introducing a teacher-guide-student framework. Specifically: 1. **Teacher Agent**: The teacher agent performs complex tasks in the environment and proposes goals. These goals are achievable by the teacher agent itself, ensuring the feasibility of the goals. 2. **Guide Agent**: The guide agent observes the teacher's behavior, describes these behaviors as natural language instructions, and generates multiple synonymous instructions to enhance the student's generalization ability. 3. **Student Agent**: The student agent is a goal-conditioned RL agent that executes tasks based on the natural language instructions provided by the guide and gradually learns how to complete these tasks. ### Experimental Validation The paper conducts experiments in complex sparse reward environments to validate the effectiveness of GLIDE-RL. The experimental results show that with the help of multiple teachers and guides, the student agent not only learns better during training but also demonstrates good generalization ability on unseen goals and instructions. ### Conclusion By introducing a multi-teacher and guide framework, the paper addresses the challenges of training RL agents that can understand and execute natural language instructions in sparse reward environments. The experimental results validate the effectiveness of this method, showcasing its potential application in complex tasks.

GLIDE-RL: Grounded Language Instruction through DEmonstration in RL

GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving

Words as Beacons: Guiding RL Agents with High-Level Language Prompts

Search Beyond Queries: Training Smaller Language Models for Web Interactions via Reinforcement Learning

Grounded Reinforcement Learning: Learning to Win the Game under Human Commands

Grounding Language for Transfer in Deep Reinforcement Learning

Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback

Language Guided Exploration for RL Agents in Text Environments

Deep reinforcement learning from human preferences

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models

Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

Grounding Language to Autonomously-Acquired Skills via Goal Generation

Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs

Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning

BabyAI++: Towards Grounded-Language Learning beyond Memorization.

From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following