Polybot: Training One Policy Across Robots While Embracing Variability

Jonathan Yang,Dorsa Sadigh,Chelsea Finn
2023-07-08
Abstract:Reusing large datasets is crucial to scale vision-based robotic manipulators to everyday scenarios due to the high cost of collecting robotic datasets. However, robotic platforms possess varying control schemes, camera viewpoints, kinematic configurations, and end-effector morphologies, posing significant challenges when transferring manipulation skills from one platform to another. To tackle this problem, we propose a set of key design decisions to train a single policy for deployment on multiple robotic platforms. Our framework first aligns the observation and action spaces of our policy across embodiments via utilizing wrist cameras and a unified, but modular codebase. To bridge the remaining domain shift, we align our policy's internal representations across embodiments through contrastive learning. We evaluate our method on a dataset collected over 60 hours spanning 6 tasks and 3 robots with varying joint configurations and sizes: the WidowX 250S, the Franka Emika Panda, and the Sawyer. Our results demonstrate significant improvements in success rate and sample efficiency for our policy when using new task data collected on a different robot, validating our proposed design decisions. More details and videos can be found on our anonymized project website: <a class="link-external link-https" href="https://sites.google.com/view/polybot-multirobot" rel="external noopener nofollow">this https URL</a>
Robotics,Machine Learning
What problem does this paper attempt to address?
This paper attempts to address the challenges encountered when reusing large-scale datasets across different robotic platforms. Specifically, due to significant differences in control schemes, camera perspectives, kinematic configurations, and end-effector morphologies among different robotic platforms, there is a substantial domain transfer issue when transferring operational skills from one platform to another. To address this issue, the authors propose a set of key design decisions to train a single policy that can be deployed across multiple robotic platforms. ### Main Issues 1. **Domain Transfer Issue**: How to effectively utilize data across different robotic platforms and reduce the impact of domain transfer. 2. **Multi-Robot Generalization**: How to design decisions that enable a single policy to adapt to various robotic platforms, thereby improving task success rates and sample efficiency. ### Solutions 1. **Observation Space Alignment**: Align the observation spaces of different robotic platforms by using wrist cameras and a unified but modular codebase. 2. **Action Space Alignment**: Align the action spaces of different robotic platforms through a shared high-level control environment and inverse kinematics solvers. 3. **Internal Representation Alignment**: Further reduce the impact of domain transfer by aligning the internal representations of the policy through contrastive learning. ### Experimental Validation The authors conducted experiments on a multi-task, multi-robot dataset containing 60 hours of data to validate the effectiveness of the proposed methods. The experimental results show that the proposed method significantly improves success rates and sample efficiency on new tasks, especially in tasks requiring 6 degrees of freedom motion. ### Main Contributions 1. **End-to-End Closed-Loop Policy Learning Pipeline**: Capable of reusing and benefiting from data collected by other robotic platforms. 2. **Multi-Head Policy Training**: Learns the specific dynamics of each robot through multi-head policy training while maintaining consistency in visual features and overall motion direction. 3. **Internal Representation Alignment**: Further enhances cross-platform task generalization by aligning the internal representations of the policy through contrastive learning. ### Experimental Results - In zero-shot and few-shot scenarios, the proposed method significantly outperforms baseline methods that use data from a single robot on new tasks. - Multi-head policy training outperforms blocking controllers in tasks requiring 6 degrees of freedom motion. - Contrastive learning provides significant performance improvements in internal representation alignment, with an average success rate increase of 19%. Through these design decisions and experimental validations, this paper provides an effective solution for data reuse and task generalization across multiple robotic platforms.