Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

Weirui Ye,Yunsheng Zhang,Haoyang Weng,Xianfan Gu,Shengjie Wang,Tong Zhang,Mengchen Wang,Pieter Abbeel,Yang Gao
2024-10-11
Abstract:Reinforcement learning (RL) is a promising approach for solving robotic manipulation tasks. However, it is challenging to apply the RL algorithms directly in the real world. For one thing, RL is data-intensive and typically requires millions of interactions with environments, which are impractical in real scenarios. For another, it is necessary to make heavy engineering efforts to design reward functions manually. To address these issues, we leverage foundation models in this paper. We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. The benefits of our framework are threefold: (1) \textit{sample efficient}; (2) \textit{minimal and effective reward engineering}; (3) \textit{agnostic to foundation model forms and robust to noisy priors}. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation. Across 5 dexterous tasks with real robots, FAC achieves an average success rate of 86\% after one hour of real-time learning. Across 8 tasks in the simulated Meta-world, FAC achieves 100\% success rates in 7/8 tasks under less than 100k frames (about 1-hour training), outperforming baseline methods with manual-designed rewards in 1M frames. We believe the RLFP framework can enable future robots to explore and learn autonomously in the physical world for more tasks. Visualizations and code are available at \url{<a class="link-external link-https" href="https://yewr.github.io/rlfp" rel="external noopener nofollow">this https URL</a>}.
Robotics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address two major challenges faced by Reinforcement Learning (RL) algorithms when applied in the real world: low sample efficiency and complex reward function design. Specifically, current RL algorithms require a large amount of environment interaction data in practical applications, which is often impractical in real-world scenarios. Additionally, manually designing reward functions is not only time-consuming and labor-intensive but also often requires significant engineering effort. To address these issues, the paper proposes leveraging Foundation Models to provide prior knowledge of policy, value, and success rewards to guide and accelerate the learning process. Through this approach, the paper aims to enable RL algorithms to learn and explore autonomously in the physical world more efficiently. The main contributions of the paper include: 1. **Proposing the Reinforcement Learning with Foundation Priors (RLFP) framework**: Systematically introducing three types of prior knowledge—policy, value, and success rewards—and explaining how to utilize existing Foundation Models as sources of these priors. 2. **Proposing the Foundation-guided Actor-Critic (FAC) algorithm**: Based on the RLFP framework, utilizing prior knowledge of policy, value, and success rewards to guide the learning process. 3. **Empirically demonstrating the effectiveness of FAC**: Experiments conducted in real robot and simulated environments show that FAC excels in sample efficiency, minimized reward engineering, and robustness to different forms of Foundation Models. Through these contributions, the paper provides new possibilities for future robots to autonomously learn and perform more tasks in the physical world.