PerAct2: Benchmarking and Learning for Robotic Bimanual Manipulation Tasks

Markus Grotz,Mohit Shridhar,Tamim Asfour,Dieter Fox
2024-08-01
Abstract:Bimanual manipulation is challenging due to precise spatial and temporal coordination required between two arms. While there exist several real-world bimanual systems, there is a lack of simulated benchmarks with a large task diversity for systematically studying bimanual capabilities across a wide range of tabletop tasks. This paper addresses the gap by extending RLBench to bimanual manipulation. We open-source our code and benchmark comprising 13 new tasks with 23 unique task variations, each requiring a high degree of coordination and adaptability. To kickstart the benchmark, we extended several state-of-the art methods to bimanual manipulation and also present a language-conditioned behavioral cloning agent -- PerAct2, which enables the learning and execution of bimanual 6-DoF manipulation tasks. Our novel network architecture efficiently integrates language processing with action prediction, allowing robots to understand and perform complex bimanual tasks in response to user-specified goals. Project website with code is available at: <a class="link-external link-http" href="http://bimanual.github.io" rel="external noopener nofollow">this http URL</a>
Robotics,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the **benchmarking and learning problems of bimanual manipulation tasks**. Specifically, bimanual manipulation tasks require precise spatial and temporal coordination between two robotic arms, which is more challenging than single - arm manipulation. However, currently there is a lack of simulation benchmarks that can systematically study bimanual manipulation capabilities, especially when it comes to a wide range of desktop tasks. To fill this gap, the authors extended the existing robot - learning benchmark platform RLBench to make it suitable for bimanual manipulation tasks. They open - sourced the code and the benchmark test set, which includes 13 new bimanual manipulation tasks and 23 unique task variants, each of which requires a high degree of coordination and adaptability. In addition, the author also proposed a language - conditioned behavior cloning agent - PerAct2, which is an extension of the PerAct framework. PerAct2 is able to learn and execute 6 - DoF bimanual manipulation tasks, and its novel network architecture can efficiently combine language processing with action prediction, enabling the robot to understand and execute complex bimanual tasks. #### Main contributions: 1. **A new benchmark test set**: It contains 13 bimanual manipulation tasks and 23 unique task variants. 2. **A new network architecture PerAct2**: Based on the PerAct framework, it is used to predict bimanual manipulation actions. 3. **Qualitative experiments in the real world**: To verify the effectiveness of the method. Through these contributions, the authors hope to promote the research on bimanual robot manipulation and provide more abundant benchmark tests and generalization capabilities for future skill learning.