Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation

Shaopeng Zhai,Jie Wang,Tianyi Zhang,Fuxian Huang,Qi Zhang,Ming Zhou,Jing Hou,Yu Qiao,Yu Liu

2024-02-07

Abstract:Building embodied agents on integrating Large Language Models (LLMs) and Reinforcement Learning (RL) have revolutionized human-AI interaction: researchers can now leverage language instructions to plan decision-making for open-ended tasks. However, existing research faces challenges in meeting the requirement of open-endedness. They typically either train LLM/RL models to adapt to a fixed counterpart, limiting exploration of novel skills and hindering the efficacy of human-AI interaction. To this end, we present OpenPAL, a co-training framework comprising two stages: (1) fine-tuning a pre-trained LLM to translate human instructions into goals for planning, and goal-conditioned training a policy for decision-making; (2) co-training to align the LLM and policy, achieving instruction open-endedness. We conducted experiments using Contra, an open-ended FPS game, demonstrating that an agent trained with OpenPAL not only comprehends arbitrary instructions but also exhibits efficient execution. These results suggest that OpenPAL holds the potential to construct open-ended embodied agents in practical scenarios.

Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to construct an embodied agent capable of handling open - ended tasks, especially by integrating large - language models (LLMs) and reinforcement learning (RL) to achieve this goal. Specifically, although existing research has made certain progress in using LLMs and RL models, they are usually only able to adapt to fixed environments or tasks, which limits the ability to explore new skills and affects the effectiveness of human - machine interaction. Therefore, the paper proposes a co - training framework named OpenPAL, aiming to overcome these challenges, enabling the agent to understand and execute arbitrary instructions and show high efficiency during the execution process. The OpenPAL framework achieves two - way adaptation through two - stage training: first, fine - tuning the pre - trained LLM to translate human instructions into planning goals and conducting policy training under goal - conditioned; second, aligning the LLM and the policy through co - training to achieve the openness of instructions. The experimental results show that the agent trained with OpenPAL can not only understand arbitrary instructions but also execute tasks efficiently, indicating that OpenPAL has the potential to construct embodied agents for open - ended tasks in practical scenarios.

Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation

Building Cooperative Embodied Agents Modularly with Large Language Models

AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models

LEGENT: Open Platform for Embodied Agents

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

Large Language Models as Generalizable Policies for Embodied Tasks

Empowering Large Language Model Agents through Action Learning

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Inner Monologue: Embodied Reasoning through Planning with Language Models

Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized Imitation Learning

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

Embodied Executable Policy Learning with Language-based Scene Summarization

Embodied Task Planning with Large Language Models

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation

Empowering Large Language Models on Robotic Manipulation with Affordance Prompting

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

Code as Policies: Language Model Programs for Embodied Control

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond