Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

Weirui Ye,Yunsheng Zhang,Haoyang Weng,Xianfan Gu,Shengjie Wang,Tong Zhang,Mengchen Wang,Pieter Abbeel,Yang Gao

2024-10-11

Abstract:Reinforcement learning (RL) is a promising approach for solving robotic manipulation tasks. However, it is challenging to apply the RL algorithms directly in the real world. For one thing, RL is data-intensive and typically requires millions of interactions with environments, which are impractical in real scenarios. For another, it is necessary to make heavy engineering efforts to design reward functions manually. To address these issues, we leverage foundation models in this paper. We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. The benefits of our framework are threefold: (1) \textit{sample efficient}; (2) \textit{minimal and effective reward engineering}; (3) \textit{agnostic to foundation model forms and robust to noisy priors}. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation. Across 5 dexterous tasks with real robots, FAC achieves an average success rate of 86\% after one hour of real-time learning. Across 8 tasks in the simulated Meta-world, FAC achieves 100\% success rates in 7/8 tasks under less than 100k frames (about 1-hour training), outperforming baseline methods with manual-designed rewards in 1M frames. We believe the RLFP framework can enable future robots to explore and learn autonomously in the physical world for more tasks. Visualizations and code are available at \url{<a class="link-external link-https" href="https://yewr.github.io/rlfp" rel="external noopener nofollow">this https URL</a>}.

Robotics,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The paper attempts to address two major challenges faced by Reinforcement Learning (RL) algorithms when applied in the real world: low sample efficiency and complex reward function design. Specifically, current RL algorithms require a large amount of environment interaction data in practical applications, which is often impractical in real-world scenarios. Additionally, manually designing reward functions is not only time-consuming and labor-intensive but also often requires significant engineering effort. To address these issues, the paper proposes leveraging Foundation Models to provide prior knowledge of policy, value, and success rewards to guide and accelerate the learning process. Through this approach, the paper aims to enable RL algorithms to learn and explore autonomously in the physical world more efficiently. The main contributions of the paper include: 1. **Proposing the Reinforcement Learning with Foundation Priors (RLFP) framework**: Systematically introducing three types of prior knowledge—policy, value, and success rewards—and explaining how to utilize existing Foundation Models as sources of these priors. 2. **Proposing the Foundation-guided Actor-Critic (FAC) algorithm**: Based on the RLFP framework, utilizing prior knowledge of policy, value, and success rewards to guide the learning process. 3. **Empirically demonstrating the effectiveness of FAC**: Experiments conducted in real robot and simulated environments show that FAC excels in sample efficiency, minimized reward engineering, and robustness to different forms of Foundation Models. Through these contributions, the paper provides new possibilities for future robots to autonomously learn and perform more tasks in the physical world.

Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

Foundation Reinforcement Learning: Towards Embodied Generalist Agents with Foundation Prior Assistance

Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

CLFR-M: Continual Learning Framework for Robots Via Human Feedback and Dynamic Memory

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

Task-Oriented Deep Reinforcement Learning for Robotic Skill Acquisition and Control

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment

Part-Guided 3D RL for Sim2Real Articulated Object Manipulation

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

Affordance-Guided Reinforcement Learning via Visual Prompting

The Ingredients of Real-World Robotic Reinforcement Learning

Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

Integrating Reinforcement Learning with Foundation Models for Autonomous Robotics: Methods and Perspectives

Transferring Foundation Models for Generalizable Robotic Manipulation

Robot Learning of Mobile Manipulation with Reachability Behavior Priors

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers

ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models

Learning a Universal Human Prior for Dexterous Manipulation from Human Preference

Efficient Learning of High Level Plans from Play