Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Tony Z. Zhao,Vikash Kumar,Sergey Levine,Chelsea Finn
2023-04-24
Abstract:Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface. Imitation learning, however, presents its own challenges, particularly in high-precision domains: errors in the policy can compound over time, and human demonstrations can be non-stationary. To address these challenges, we develop a simple yet novel algorithm, Action Chunking with Transformers (ACT), which learns a generative model over action sequences. ACT allows the robot to learn 6 difficult tasks in the real world, such as opening a translucent condiment cup and slotting a battery with 80-90% success, with only 10 minutes worth of demonstrations. Project website: <a class="link-external link-https" href="https://tonyzhaozh.github.io/aloha/" rel="external noopener nofollow">this https URL</a>
Robotics,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to achieve fine - grained bimanual manipulation tasks using low - cost hardware. Specifically, the paper focuses on how, through low - cost robotic systems and imitation - learning algorithms, robots can complete tasks that require high precision, delicate hand - eye coordination, and closed - loop visual feedback, such as threading cable ties or inserting batteries. These tasks usually require high - end robots, precise sensors, or careful calibration, which are costly and difficult to set up. The paper proposes a low - cost open - source hardware system named ALOHA, combined with a new imitation - learning algorithm - Action Chunking with Transformers (ACT), enabling low - cost hardware to perform these fine - manipulation tasks as well. With only 10 minutes of human demonstration, the robot can learn to perform 6 complex tasks with a success rate as high as 80 - 90%.