A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models

Nils Ingelhag,Jesper Munkeby,Jonne van Haastregt,Anastasia Varava,Michael C. Welle,Danica Kragic
2024-03-25
Abstract:In this paper, we build upon two major recent developments in the field, Diffusion Policies for visuomotor manipulation and large pre-trained multimodal foundational models to obtain a robotic skill learning system. The system can obtain new skills via the behavioral cloning approach of visuomotor diffusion policies given teleoperated demonstrations. Foundational models are being used to perform skill selection given the user's prompt in natural language. Before executing a skill the foundational model performs a precondition check given an observation of the workspace. We compare the performance of different foundational models to this end as well as give a detailed experimental evaluation of the skills taught by the user in simulation and the real world. Finally, we showcase the combined system on a challenging food serving scenario in the real world. Videos of all experimental executions, as well as the process of teaching new skills in simulation and the real world, are available on the project's website.
Robotics
What problem does this paper attempt to address?
The paper aims to address the issue of how to ensure that robots can perform multiple tasks required by users in specific environments. Specifically, the paper proposes a Robotic Skill Learning System (RSLS) based on Diffusion Policies and large pre-trained Foundation Models. This system acquires new skills from user teleoperation demonstrations through behavior cloning and utilizes foundation models to select appropriate skills based on the user's natural language instructions. Before executing the skills, the foundation model also checks the prerequisites based on observations of the workspace. The paper thoroughly evaluates the performance of user-taught skills in both simulation and real-world scenarios and demonstrates the system's comprehensive performance in an actual food distribution scenario. In short, the goal of the paper is to create a robotic system that can quickly learn new skills through user demonstrations and select appropriate skills to perform tasks based on natural language instructions. This enables the robot to adapt to various needs in ever-changing environments.