Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation

Malek Aburub,Cristian C. Beltran-Hernandez,Tatsuya Kamijo,Masashi Hamaya
2024-10-25
Abstract:Robots hold great promise for performing repetitive or hazardous tasks, but achieving human-like dexterity, especially in contact-rich and dynamic environments, remains challenging. Rigid robots, which rely on position or velocity control, often struggle with maintaining stable contact and applying consistent force in force-intensive tasks. Learning from Demonstration has emerged as a solution, but tasks requiring intricate maneuvers, such as powder grinding, present unique difficulties. This paper introduces Diffusion Policies For Compliant Manipulation (DIPCOM), a novel diffusion-based framework designed for compliant control tasks. By leveraging generative diffusion models, we develop a policy that predicts Cartesian end-effector poses and adjusts arm stiffness to maintain the necessary force. Our approach enhances force control through multimodal distribution modeling, improves the integration of diffusion policies in compliance control, and extends our previous work by demonstrating its effectiveness in real-world tasks. We present a detailed comparison between our framework and existing methods, highlighting the advantages and best practices for deploying diffusion-based compliance control.
Robotics,Artificial Intelligence
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: How can robots achieve human - like dexterous manipulation through Learning from Demonstration (LfD) in rich and dynamic environments, especially in tasks requiring fine control force? Specifically, traditional rigid robots rely on position or velocity control and have difficulties in maintaining stable contact and applying consistent force, especially in complex tasks that require long - term repetitive behaviors, such as powder grinding. To solve these problems, the author proposes a new framework named Diffusion Policies For Compliant Manipulation (DIPCOM). This framework is based on the diffusion model and aims to handle compliant control tasks through the generative model. Its main contributions include: 1. **New Diffusion Model Framework**: A new framework based on the diffusion model is proposed to learn complex contact - rich operations from demonstrations. The diffusion model can capture multi - modal action distributions, thereby enhancing force control capabilities. 2. **Improved Compliant Control Strategy**: By combining the diffusion model with a compliant controller, DIPCOM can predict the pose of the Cartesian end - effector and adjust the stiffness of the robotic arm to apply the necessary force. This enables the robot to adaptively adjust the force while maintaining accuracy. 3. **Comparative Experiments**: Through a series of challenging practical tasks, such as powder grinding, erasing pencil marks with an eraser, and inserting a two - armed cylindrical plug, the advantages of DIPCOM over previous methods (such as Comp - ACT) are demonstrated, and the best practices are summarized. ### Key Formulas - **Mean Squared Error Loss Function**: \[ L_{\text{sample}}=\|a_0^t - \hat{a}_0^t\|^2 \] where \(a_0^t\) is the real action and \(\hat{a}_0^t\) is the predicted action. - **Iterative Prediction Formula in the Denoising Process**: \[ a_{n - 1}^t=p\beta_{n - 1}\hat{a}_0^t+p\sqrt{1 - \beta_{n - 1}}\cdot a_n^t-\sqrt{\beta}\hat{a}_n^t\sqrt{1 - \beta} \] where \(a_n^t\) represents the time - step noise data at the \(n\) - th step, \(\beta\) is the cumulative noise scale, and \(\hat{a}_0^t\) is the estimated original data. ### Summary The main purpose of the paper is to enable robots to complete complex contact - rich tasks more flexibly and accurately through the introduction of the diffusion model, especially those tasks requiring long - term continuous force control. DIPCOM not only improves the success rate of tasks but also performs well in behavioral diversity and can better imitate human operation methods.