Abstract:The tremendous success of behavior cloning (BC) in robotic manipulation has been largely confined to tasks where demonstrations can be effectively collected through human teleoperation. However, demonstrations for contact-rich manipulation tasks that require complex coordination of multiple contacts are difficult to collect due to the limitations of current teleoperation interfaces. We investigate how to leverage model-based planning and optimization to generate training data for contact-rich dexterous manipulation tasks. Our analysis reveals that popular sampling-based planners like rapidly exploring random tree (RRT), while efficient for motion planning, produce demonstrations with unfavorably high entropy. This motivates modifications to our data generation pipeline that prioritizes demonstration consistency while maintaining solution diversity. Combined with a diffusion-based goal-conditioned BC approach, our method enables effective policy learning and zero-shot transfer to hardware for two challenging contact-rich manipulation tasks.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper mainly explores how to use model - based planning to generate training data for learning complex contact - rich manipulation skills. Specifically, the paper aims to solve the following problems: 1. **Data collection challenges for contact - rich manipulation tasks**: - For manipulation tasks that require complex multi - point contact coordination, it is difficult for humans to provide high - quality demonstration data through teleoperation. Current teleoperation systems can only track the movements of the robot's end - effector and are difficult to capture tasks involving full - arm contact and multi - finger coordination. - The data collection process depends on the availability of human operators and is difficult to scale up on a large scale like visual and language tasks. 2. **The impact of high - entropy demonstration data on policy performance**: - Demonstration data generated using popular sampling - based planning algorithms (such as RRT, Rapidly - exploring Random Trees) has high entropy, which leads to poor performance in policy learning. - The paper proves through experiments that high - entropy demonstration data reduces policy performance, especially in the case of low data volume. 3. **Design of an effective data generation pipeline**: - It is necessary to design a data generation pipeline that can generate consistent and high - quality training data to promote effective policy learning. - To address the high - entropy problem, the paper proposes a greedy search method and a global contact planner. These methods improve the consistency of demonstration data while ensuring planning integrity. 4. **Utilization of multi - modal and sub - optimal data**: - A goal - conditioned behavior cloning method based on the diffusion model is proposed, which can effectively use multi - modal and sub - optimal data for learning. - Through the hindsight goal relabeling technique, demonstrations that fail to reach the expected goal are regarded as valid data for successfully reaching certain specific states. 5. **Zero - shot transfer to hardware**: - Research on how to transfer the policies learned in the simulation environment to actual hardware with zero - shot and evaluate their performance. ### Main contributions - **Demonstrated the negative impact of high - entropy demonstration data on policy performance**: Experiments show that using inconsistent, high - entropy demonstration data significantly reduces policy performance. - **Proposed a data generation pipeline**: This pipeline can generate consistent training data, which is helpful for more effective policy learning. - **Introduced a goal - conditioned behavior cloning method**: This method combines the diffusion model and hindsight goal relabeling and can utilize multi - modal and sub - optimal data. ### Experimental verification The paper carried out experimental verification on two challenging contact - rich manipulation tasks: 1. **AllegroHand**: An in - hand object rotation task, using a 16 - degree - of - freedom dexterous hand to reorient a cube. 2. **IiwaBimanual**: A two - arm manipulation task, using two robotic arms to rotate a large object by 180 degrees. Through these experiments, the paper demonstrated the effectiveness and robustness of the proposed methods in simulation and on actual hardware.

Should We Learn Contact-Rich Manipulation Policies from Sampling-Based Planners?

Human Demonstration Trajectory Refinement for Redundant Manipulators.

Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation

Contact Optimization with Learning from Demonstration: Application in Long-term Non-prehensile Planar Manipulation

Sampling-based Exploration for Reinforcement Learning of Dexterous Manipulation

Leveraging the Efficiency of Multi-Task Robot Manipulation Via Task-Evoked Planner and Reinforcement Learning

Robust Manipulation Primitive Learning via Domain Contraction

Efficient Robot Skill Learning with Imitation from a Single Video for Contact-Rich Fabric Manipulation

Contact-Implicit Model Predictive Control for Dexterous In-hand Manipulation: A Long-Horizon and Robust Approach

Jacta: A Versatile Planner for Learning Dexterous and Whole-body Manipulation

Learning Task Planning from Multi-Modal Demonstration for Multi-Stage Contact-Rich Manipulation

MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation

Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation

DROP: Dexterous Reorientation via Online Planning

Admittance Visuomotor Policy Learning for General-Purpose Contact-Rich Manipulations

Enhancing Dexterity in Robotic Manipulation via Hierarchical Contact Exploration

A Novel Contact State Estimation Method for Robot Manipulation Skill Learning Via Environment Dynamics and Constraints Modeling

Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation

Diffusion-Informed Probabilistic Contact Search for Multi-Finger Manipulation

Enhancing Task Performance of Learned Simplified Models via Reinforcement Learning