ALDM-Grasping: Diffusion-aided Zero-Shot Sim-to-Real Transfer for Robot Grasping

Yiwei Li,Zihao Wu,Huaqin Zhao,Tianze Yang,Zhengliang Liu,Peng Shu,Jin Sun,Ramviyas Parasuraman,Tianming Liu

2024-03-18

Abstract:To tackle the "reality gap" encountered in Sim-to-Real transfer, this study proposes a diffusion-based framework that minimizes inconsistencies in grasping actions between the simulation settings and realistic environments. The process begins by training an adversarial supervision layout-to-image diffusion model(ALDM). Then, leverage the ALDM approach to enhance the simulation environment, rendering it with photorealistic fidelity, thereby optimizing robotic grasp task training. Experimental results indicate this framework outperforms existing models in both success rates and adaptability to new environments through improvements in the accuracy and reliability of visual grasping actions under a variety of conditions. Specifically, it achieves a 75\% success rate in grasping tasks under plain backgrounds and maintains a 65\% success rate in more complex scenarios. This performance demonstrates this framework excels at generating controlled image content based on text descriptions, identifying object grasp points, and demonstrating zero-shot learning in complex, unseen scenarios.

Robotics

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the "reality gap" issue in robotic grasping tasks. Specifically: 1. **Data Acquisition Challenge**: In robotic vision grasping tasks, obtaining real-world datasets for training deep learning models is often costly and time-consuming, and sometimes even infeasible. 2. **Sim-to-Real Transfer**: Researchers have developed Sim-to-Real strategies to transfer from simulation environments to real environments, using techniques such as domain randomization and domain adaptation to achieve this goal. Domain randomization diversifies visual elements in training simulations (such as textures and colors), allowing the model to focus on invariant features applicable to real scenarios; domain adaptation adjusts the model from the simulated environment to the real environment. 3. **Limitations of Image Generation Models**: Traditional Generative Adversarial Networks (GANs), although performing well in image generation, require a large amount of training data and need retraining when facing new tasks or scenes, limiting their application flexibility. In contrast, diffusion models have advantages in this regard, being able to more effectively control the generated image effects. To address the above issues, the paper proposes a framework based on diffusion models called ALDM-Grasping. This framework utilizes Adversarially Supervised Layout-to-Image Diffusion Models (ALDM) to optimize the visual effects in the simulation environment, making them more consistent with the real environment. Experimental results show that this framework achieves a grasping task success rate of 75% in simple backgrounds and 65% in complex backgrounds, demonstrating its superior performance in zero-shot learning and adapting to new environments.

ALDM-Grasping: Diffusion-aided Zero-Shot Sim-to-Real Transfer for Robot Grasping

LiteGrasp: A Light Robotic Grasp Detection Via Semi-Supervised Knowledge Distillation

Ensemble Bootstrapped Deep Deterministic Policy Gradient For Vision-Based Robotic Grasping

A Cascaded Deep Learning Framework for Real-time and Robust Grasp Planning

Towards Precise Model-free Robotic Grasping with Sim-to-Real Transfer Learning

Vision-Based Robotic Object Grasping—A Deep Reinforcement Learning Approach

DexTransfer: Real World Multi-fingered Dexterous Grasping with Minimal Human Demonstrations

A Novel Robotic Grasping Method for Moving Objects Based on Multi-Agent Deep Reinforcement Learning

Towards Generalization and Data Efficient Learning of Deep Robotic Grasping

Digital Twin (DT)-CycleGAN: Enabling Zero-Shot Sim-to-Real Transfer of Visual Grasping Models

Curriculum-based Sensing Reduction in Simulation to Real-World Transfer for In-hand Manipulation

Real-to-Sim Grasp: Rethinking the Gap between Simulation and Real World in Grasp Detection

Comorbid Psychiatric Symptoms in Temporal Lobe Epilepsy To the Editor

Learn to grasp unknown objects in robotic manipulation

Triple Regression for Camera Agnostic Sim2Real Robot Grasping and Manipulation Tasks

RGBGrasp: Image-based Object Grasping by Capturing Multiple Views during Robot Arm Movement with Neural Radiance Fields

S4G: Amodal Single-view Single-Shot SE(3) Grasp Detection in Cluttered Scenes

Grasp Stability Assessment Through Attention-Guided Cross-Modality Fusion and Transfer Learning

Multi-Task Domain Adaptation for Deep Learning of Instance Grasping from Simulation

DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes

A digital twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping