Abstract:Contact-rich bimanual manipulation involves precise coordination of two arms to change object states through strategically selected contacts and motions. Due to the inherent complexity of these tasks, acquiring sufficient demonstration data and training policies that generalize to unseen scenarios remain a largely unresolved challenge. Building on recent advances in planning through contacts, we introduce Generalizable Planning-Guided Diffusion Policy Learning (GLIDE), an approach that effectively learns to solve contact-rich bimanual manipulation tasks by leveraging model-based motion planners to generate demonstration data in high-fidelity physics simulation. Through efficient planning in randomized environments, our approach generates large-scale and high-quality synthetic motion trajectories for tasks involving diverse objects and transformations. We then train a task-conditioned diffusion policy via behavior cloning using these demonstrations. To tackle the sim-to-real gap, we propose a set of essential design options in feature extraction, task representation, action prediction, and data augmentation that enable learning robust prediction of smooth action sequences and generalization to unseen scenarios. Through experiments in both simulation and the real world, we demonstrate that our approach can enable a bimanual robotic system to effectively manipulate objects of diverse geometries, dimensions, and physical properties. Website: <a class="link-external link-https" href="https://glide-manip.github.io/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve **the challenges of complex object manipulation by dual - arm robots in contact - rich environments**. Specifically, the paper focuses on how to enable the robot to change the state of an object by coordinating two robotic arms through multiple contact points, especially when facing objects with diverse geometric shapes and physical properties. Due to their inherent complexity (such as requiring long - term multi - stage contact and manipulation), obtaining sufficient demonstration data and training strategies that can generalize to unseen scenarios remain an unsolved difficult problem. #### Main problems include: 1. **Obtaining high - quality demonstration data**: For complex dual - arm manipulation tasks, collecting expert demonstration data in the real world is both difficult and expensive. 2. **The gap between simulation and the real world (Sim - to - Real Gap)**: Strategies trained with simulation data face differences in perception and dynamic characteristics when deployed in the real world. 3. **Generalization ability**: Ensure that the learned strategies can be applied to unseen objects and environments, not just the specific objects in the training set. To solve these problems, the paper proposes the **Generalizable Planning - Guided Diffusion Policy Learning (GLIDE)** method. The core idea of GLIDE is to use a model - based motion planner to generate large - scale, high - quality synthetic trajectory data in high - fidelity physical simulations, and train a conditional diffusion policy network through behavior cloning so that it can predict a smooth sequence of actions according to the observed point cloud and task description. In addition, GLIDE also introduces a series of design choices to enhance the strategy's transfer ability from the network to the real world and its generalization ability to unseen scenarios. ### Formula summary - **Objective function**: \[ \min_{q_u^+, a} (q_u^+ - q_u^{\text{goal}})^T Q (q_u^+ - q_u^{\text{goal}})+(a - q_a)^T R (a - q_a) \] where \( q_u^+ = f_{\text{local}}(q_u, q_a, a) \) represents the approximate configuration of the object after the robot executes action \( a \), and \( Q \) and \( R \) are user - specified cost matrices. - **Action sequence prediction**: \[ a_{t + 1:t+T_a}=\{q_i - q_t\}_{i = t + 1}^{t+T_a} \] where \( T_a \) is the predicted time step and \( q_t \) is the current joint position. ### Conclusion The GLIDE method successfully solves multiple challenges in contact - rich dual - arm manipulation tasks by combining efficient motion planning and diffusion policy learning, demonstrating its effectiveness and generalization ability in both simulation and the real world.

Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation

Diffusion-Informed Probabilistic Contact Search for Multi-Finger Manipulation

Admittance Visuomotor Policy Learning for General-Purpose Contact-Rich Manipulations

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation

Combining Planning and Diffusion for Mobility with Unknown Dynamics

Contact-Implicit Model Predictive Control for Dexterous In-hand Manipulation: A Long-Horizon and Robust Approach

Dexterous Functional Pre-Grasp Manipulation with Diffusion Policy

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation

Object-Centric Dexterous Manipulation from Human Motion Data

Jacta: A Versatile Planner for Learning Dexterous and Whole-body Manipulation

Plan-Guided Reinforcement Learning for Whole-Body Manipulation

Diff-LfD: Contact-aware Model-based Learning from Visual Demonstration for Robotic Manipulation Via Differentiable Physics-based Simulation and Rendering.

In-Hand Re-grasp Manipulation with Passive Dynamic Actions via Imitation Learning

AffordDP: Generalizable Diffusion Policy with Transferable Affordance

Learning Playing Piano with Bionic-Constrained Diffusion Policy for Anthropomorphic Hand

Learning Robotic Manipulation through Visual Planning and Acting

Learning Generalizable Dexterous Manipulation from Human Grasp Affordance

Adaptive Motion Planning for Multi-fingered Functional Grasp via Force Feedback

Hierarchical Diffusion Policy: manipulation trajectory generation via contact guidance