Lessons from Learning to Spin "Pens"

Jun Wang,Ying Yuan,Haichuan Che,Haozhi Qi,Yi Ma,Jitendra Malik,Xiaolong Wang
2024-10-24
Abstract:In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation systems by demonstrating the capability to spin pen-like objects. We first use reinforcement learning to train an oracle policy with privileged information and generate a high-fidelity trajectory dataset in simulation. This serves two purposes: 1) pre-training a sensorimotor policy in simulation; 2) conducting open-loop trajectory replay in the real world. We then fine-tune the sensorimotor policy using these real-world trajectories to adapt it to the real world dynamics. With less than 50 trajectories, our policy learns to rotate more than ten pen-like objects with different physical properties for multiple revolutions. We present a comprehensive analysis of our design choices and share the lessons learned during development.
Robotics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address the challenging problem of manipulating pen-like objects (such as hammers, screwdrivers, etc.) with a robotic hand. Specifically, the authors attempt to enable a robot to flexibly rotate pen-like objects through reinforcement learning and sim-to-real techniques. Current learning methods face two main issues in this task: 1. **Lack of high-quality demonstration data**: Existing learning methods struggle to collect complex dynamic demonstration data. 2. **Sim-to-real gap**: Performance in simulated environments often fails to transfer directly to the real world. To solve these problems, the authors propose a new method, achieved through the following steps: 1. **Train an Oracle Policy using reinforcement learning**: Generate high-quality trajectory data in a simulated environment. 2. **Pre-train a Sensorimotor Policy**: Use these trajectory data to pre-train a sensorimotor policy in the simulated environment. 3. **Fine-tune with real-world trajectory data**: Fine-tune the pre-trained sensorimotor policy in the real world to adapt to actual dynamic characteristics. Through this method, the authors successfully enable a robot to continuously rotate multiple pen-like objects with different physical properties in the real world. This is the first learning-based method capable of achieving continuous rotation of pen-like objects in the real world.