Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation

Shaopeng Zhai,Jie Wang,Tianyi Zhang,Fuxian Huang,Qi Zhang,Ming Zhou,Jing Hou,Yu Qiao,Yu Liu
2024-02-07
Abstract:Building embodied agents on integrating Large Language Models (LLMs) and Reinforcement Learning (RL) have revolutionized human-AI interaction: researchers can now leverage language instructions to plan decision-making for open-ended tasks. However, existing research faces challenges in meeting the requirement of open-endedness. They typically either train LLM/RL models to adapt to a fixed counterpart, limiting exploration of novel skills and hindering the efficacy of human-AI interaction. To this end, we present OpenPAL, a co-training framework comprising two stages: (1) fine-tuning a pre-trained LLM to translate human instructions into goals for planning, and goal-conditioned training a policy for decision-making; (2) co-training to align the LLM and policy, achieving instruction open-endedness. We conducted experiments using Contra, an open-ended FPS game, demonstrating that an agent trained with OpenPAL not only comprehends arbitrary instructions but also exhibits efficient execution. These results suggest that OpenPAL holds the potential to construct open-ended embodied agents in practical scenarios.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to construct an embodied agent capable of handling open - ended tasks, especially by integrating large - language models (LLMs) and reinforcement learning (RL) to achieve this goal. Specifically, although existing research has made certain progress in using LLMs and RL models, they are usually only able to adapt to fixed environments or tasks, which limits the ability to explore new skills and affects the effectiveness of human - machine interaction. Therefore, the paper proposes a co - training framework named OpenPAL, aiming to overcome these challenges, enabling the agent to understand and execute arbitrary instructions and show high efficiency during the execution process. The OpenPAL framework achieves two - way adaptation through two - stage training: first, fine - tuning the pre - trained LLM to translate human instructions into planning goals and conducting policy training under goal - conditioned; second, aligning the LLM and the policy through co - training to achieve the openness of instructions. The experimental results show that the agent trained with OpenPAL can not only understand arbitrary instructions but also execute tasks efficiently, indicating that OpenPAL has the potential to construct embodied agents for open - ended tasks in practical scenarios.