Abstract:Recent studies have uncovered the potential of Large Language Models (LLMs) in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling specific target problems, particularly in real-time dynamic environments. Additionally, deploying an LLM-based agent in practical scenarios can be both costly and time-consuming. On the other hand, reinforcement learning (RL) approaches train agents that specialize in the target task but often suffer from low sampling efficiency and high exploration costs. In this paper, we introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent. By incorporating the guidance from the teacher agent, the student agent can distill the prior knowledge of the LLM into its own model. Consequently, the student agent can be trained with significantly less data. Moreover, through further training with environment feedback, the student agent surpasses the capabilities of its teacher for completing the target task. We conducted experiments on challenging MiniGrid and Habitat environments, specifically designed for embodied AI research, to evaluate the effectiveness of our framework. The results clearly demonstrate that our approach achieves superior performance compared to strong baseline methods. Our code is available at <a class="link-external link-https" href="https://github.com/ZJLAB-AMMI/LLM4Teach" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The main aim of this paper is to address the limitations of large language models (LLMs) in executing specific tasks, particularly in decision-making within real-time dynamic environments. Specifically, the paper proposes new solutions to the following issues: 1. **LLM's lack of task specialization**: Although LLMs can handle complex sequential decision-making tasks and provide high-level instructions, they lack specialization in specific target problems, especially in real-time dynamic environments. 2. **High deployment costs**: Using LLMs for decision-making often requires substantial computational resources, such as memory and power, making their deployment in practical applications very costly. 3. **Inefficient sampling in reinforcement learning (RL)**: Traditional RL methods often have low sampling efficiency in complex and high-dimensional environments, especially in cases of sparse reward signals, leading to slow and costly learning processes. To address these issues, the authors propose a new framework called "LLM for Policy Teaching (LLM4Teach)." The core idea of this framework is to use a pre-trained LLM as a teacher agent to guide a lightweight student RL agent in quickly acquiring decision-making capabilities for specific tasks. Specifically, the student agent learns to imitate the teacher's behavior in the early stages by minimizing the difference between its actions and those of the teacher. As learning progresses, the student agent gradually shifts from relying on the teacher to relying on environmental feedback, achieved by adjusting the weights of the loss terms from teacher guidance and traditional RL in the learning objective function. The main contributions of this method include: - Proposing a policy distillation method (LLM4Teach) to overcome the limitations of LLM and RL-based agents in embodied sequential decision-making. - Demonstrating the effectiveness of the method through extensive experiments in challenging embodied environments, showing higher accuracy and lower computational burden compared to methods based solely on LLM or RL. - Highlighting that LLMs can produce various types of erroneous decisions in embodied settings, and LLM4Teach provides an effective way to mitigate or avoid the impact of these errors. Additionally, the paper verifies that providing uncertainty-aware rather than deterministic guidance through LLM can improve the learning efficiency of the student agent. In summary, this research aims to develop a new type of agent system that can learn quickly and solve problems efficiently by combining the powerful reasoning capabilities of LLMs with the effective learning mechanisms of RL.

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

Enabling Efficient Interaction between an Algorithm Agent and an LLM: A Reinforcement Learning Approach

Words as Beacons: Guiding RL Agents with High-Level Language Prompts

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study

Large Language Models as Generalizable Policies for Embodied Tasks

LLM4RL: Enhancing Reinforcement Learning with Large Language Models

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

Large Language Models as Agents in Two-Player Games

Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning

Empowering Large Language Model Agents through Action Learning

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

LgTS: Dynamic Task Sampling using LLM-generated sub-goals for Reinforcement Learning Agents

Large Language Model-based Human-Agent Collaboration for Complex Task Solving

LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions