Should We Learn Contact-Rich Manipulation Policies from Sampling-Based Planners?

Huaijiang Zhu,Tong Zhao,Xinpei Ni,Jiuguang Wang,Kuan Fang,Ludovic Righetti,Tao Pang
2024-12-13
Abstract:The tremendous success of behavior cloning (BC) in robotic manipulation has been largely confined to tasks where demonstrations can be effectively collected through human teleoperation. However, demonstrations for contact-rich manipulation tasks that require complex coordination of multiple contacts are difficult to collect due to the limitations of current teleoperation interfaces. We investigate how to leverage model-based planning and optimization to generate training data for contact-rich dexterous manipulation tasks. Our analysis reveals that popular sampling-based planners like rapidly exploring random tree (RRT), while efficient for motion planning, produce demonstrations with unfavorably high entropy. This motivates modifications to our data generation pipeline that prioritizes demonstration consistency while maintaining solution diversity. Combined with a diffusion-based goal-conditioned BC approach, our method enables effective policy learning and zero-shot transfer to hardware for two challenging contact-rich manipulation tasks.
Robotics
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper mainly explores how to use model - based planning to generate training data for learning complex contact - rich manipulation skills. Specifically, the paper aims to solve the following problems: 1. **Data collection challenges for contact - rich manipulation tasks**: - For manipulation tasks that require complex multi - point contact coordination, it is difficult for humans to provide high - quality demonstration data through teleoperation. Current teleoperation systems can only track the movements of the robot's end - effector and are difficult to capture tasks involving full - arm contact and multi - finger coordination. - The data collection process depends on the availability of human operators and is difficult to scale up on a large scale like visual and language tasks. 2. **The impact of high - entropy demonstration data on policy performance**: - Demonstration data generated using popular sampling - based planning algorithms (such as RRT, Rapidly - exploring Random Trees) has high entropy, which leads to poor performance in policy learning. - The paper proves through experiments that high - entropy demonstration data reduces policy performance, especially in the case of low data volume. 3. **Design of an effective data generation pipeline**: - It is necessary to design a data generation pipeline that can generate consistent and high - quality training data to promote effective policy learning. - To address the high - entropy problem, the paper proposes a greedy search method and a global contact planner. These methods improve the consistency of demonstration data while ensuring planning integrity. 4. **Utilization of multi - modal and sub - optimal data**: - A goal - conditioned behavior cloning method based on the diffusion model is proposed, which can effectively use multi - modal and sub - optimal data for learning. - Through the hindsight goal relabeling technique, demonstrations that fail to reach the expected goal are regarded as valid data for successfully reaching certain specific states. 5. **Zero - shot transfer to hardware**: - Research on how to transfer the policies learned in the simulation environment to actual hardware with zero - shot and evaluate their performance. ### Main contributions - **Demonstrated the negative impact of high - entropy demonstration data on policy performance**: Experiments show that using inconsistent, high - entropy demonstration data significantly reduces policy performance. - **Proposed a data generation pipeline**: This pipeline can generate consistent training data, which is helpful for more effective policy learning. - **Introduced a goal - conditioned behavior cloning method**: This method combines the diffusion model and hindsight goal relabeling and can utilize multi - modal and sub - optimal data. ### Experimental verification The paper carried out experimental verification on two challenging contact - rich manipulation tasks: 1. **AllegroHand**: An in - hand object rotation task, using a 16 - degree - of - freedom dexterous hand to reorient a cube. 2. **IiwaBimanual**: A two - arm manipulation task, using two robotic arms to rotate a large object by 180 degrees. Through these experiments, the paper demonstrated the effectiveness and robustness of the proposed methods in simulation and on actual hardware.