Abstract:Recently, diffusion policy has shown impressive results in handling multi-modal tasks in robotic manipulation. However, it has fundamental limitations in out-of-distribution failures that persist due to compounding errors and its limited capability to extrapolate. One way to address these limitations is robot-gated DAgger, an interactive imitation learning with a robot query system to actively seek expert help during policy rollout. While robot-gated DAgger has high potential for learning at scale, existing methods like Ensemble-DAgger struggle with highly expressive policies: They often misinterpret policy disagreements as uncertainty at multi-modal decision points. To address this problem, we introduce Diff-DAgger, an efficient robot-gated DAgger algorithm that leverages the training objective of diffusion policy. We evaluate Diff-DAgger across different robot tasks including stacking, pushing, and plugging, and show that Diff-DAgger improves the task failure prediction by 37%, the task completion rate by 14%, and reduces the wall-clock time by up to 540%. We hope that this work opens up a path for efficiently incorporating expressive yet data-hungry policies into interactive robot learning settings. Project website: <a class="link-external link-http" href="http://diffdagger.github.io" rel="external noopener nofollow">this http URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in robot manipulation tasks, although the existing imitation learning methods based on diffusion strategies can handle multi - modal tasks, they still have problems of performance degradation and error accumulation when encountering out - of - distribution (OOD) data. Specifically, existing methods such as Ensemble - DAgger are prone to misjudging internal distribution data as OOD data when dealing with multi - modal decision points, resulting in unnecessary expert intervention or failure to request help in a timely manner. To address these problems, the paper proposes the Diff - DAgger algorithm, aiming to make improvements in the following aspects: 1. **Improve the accuracy of OOD detection**: Use the loss function of the diffusion model to estimate the uncertainty of the action plan, so as to more accurately identify OOD situations. 2. **Reduce unnecessary expert intervention**: Optimize the query system to ensure that expert help is only requested when truly needed, avoiding premature or late intervention. 3. **Accelerate the learning process**: Improve the efficiency in the training and inference processes to reduce the overall training time. ### Specific problem description - **OOD failure problem**: Existing diffusion strategies are prone to the OOD failure problem in behavior cloning when facing unseen states, that is, they cannot extrapolate to new states due to the accumulation of prediction errors. - **Multi - modal data processing**: Existing robot - gated DAgger methods (such as Ensemble - DAgger) perform poorly when processing multi - modal data, especially when the task allows multiple completion methods, and are prone to misjudging uncertainty. - **Interactive learning efficiency**: How to reduce the number of expert interventions and accelerate the learning speed while ensuring the task success rate. ### Solution overview The Diff - DAgger algorithm proposed in the paper solves the above problems by introducing a query system based on the diffusion model loss function. Specifically: - **Diffusion loss for OOD detection**: Judge whether the current state belongs to OOD by calculating the diffusion loss of the generated action and comparing it with the expected loss in the training data. - **Efficient expert query mechanism**: Request expert help only when the diffusion loss in consecutive multiple time steps exceeds the set threshold, reducing the false positive rate. - **Accelerate training and inference**: Significantly shorten the training time and inference time by adjusting parameters such as prediction types and the number of diffusion steps. ### Experimental verification The paper verifies the effectiveness of Diff - DAgger through three robot tasks (stacking, pushing, and plugging and unplugging). The results show that it is superior to existing methods in terms of task failure prediction, task completion rate, and training time. ### Summary The core objective of this paper is to solve the limitations of existing imitation learning methods in handling OOD data and multi - modal tasks by introducing an efficient query system based on the diffusion model loss function, thereby improving the success rate and learning efficiency of robot manipulation tasks.

Diff-DAgger: Uncertainty Estimation with Diffusion Policy for Robotic Manipulation

Diff-DAgger: Uncertainty Estimation with Diffusion Policy for Robotic Manipulation

Ensemble Bootstrapped Deep Deterministic Policy Gradient For Vision-Based Robotic Grasping

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning

One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation

EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning

Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation

ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy

Hierarchical Diffusion Policy: manipulation trajectory generation via contact guidance

Diffusion Co-Policy for Synergistic Human-Robot Collaborative Tasks

Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback

Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation

Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation

HG-DAgger: Interactive Imitation Learning with Human Experts

DropoutDAgger: A Bayesian Approach to Safe Imitation Learning

MEGA-DAgger: Imitation Learning with Multiple Imperfect Experts

Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

Prediction with Action: Visual Policy Learning via Joint Denoising Process