Abstract:Diffusion-based policies have shown impressive performance in robotic manipulation tasks while struggling with out-of-domain distributions. Recent efforts attempted to enhance generalization by improving the visual feature encoding for diffusion policy. However, their generalization is typically limited to the same category with similar appearances. Our key insight is that leveraging affordances--manipulation priors that define "where" and "how" an agent interacts with an object--can substantially enhance generalization to entirely unseen object instances and categories. We introduce the Diffusion Policy with transferable Affordance (AffordDP), designed for generalizable manipulation across novel categories. AffordDP models affordances through 3D contact points and post-contact trajectories, capturing the essential static and dynamic information for complex tasks. The transferable affordance from in-domain data to unseen objects is achieved by estimating a 6D transformation matrix using foundational vision models and point cloud registration techniques. More importantly, we incorporate affordance guidance during diffusion sampling that can refine action sequence generation. This guidance directs the generated action to gradually move towards the desired manipulation for unseen objects while keeping the generated action within the manifold of action space. Experimental results from both simulated and real-world environments demonstrate that AffordDP consistently outperforms previous diffusion-based methods, successfully generalizing to unseen instances and categories where others fail.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem that existing imitation learning methods based on diffusion models are difficult to generalize to unseen object instances and categories (out - of - domain distributions) when dealing with robot manipulation tasks. Specifically: 1. **Limitations of existing methods**: - Although diffusion - model - based policies perform well in robot manipulation tasks, they perform poorly when dealing with tasks outside the distribution range of training data. - Existing methods mainly rely on the improvement of visual feature encoding, and these improvements can usually only be generalized to the same - class objects with similar appearances, and cannot effectively handle objects of completely different classes or shapes. 2. **Introducing transferable manipulation prior knowledge (Affordance)**: - The author proposes to use manipulation prior knowledge (affordances), that is, the knowledge that defines "where" and "how" to interact with objects, to enhance the generalization ability of the model. - This manipulation prior knowledge can help the model better understand the interaction modes of different objects, so as to achieve effective manipulation of unseen object instances and categories. 3. **Specific problem description**: - How to effectively transfer the manipulation prior knowledge of known objects to unseen objects? - How to ensure that the generated action sequences not only conform to the overall action distribution, but also meet the requirements of specific tasks, especially in high - precision tasks (such as grasping the doorknob to open the door)? By introducing **AffordDP** (Diffusion Policy with Transferable Manipulation Prior Knowledge), the author hopes to solve the above problems and achieve effective generalization for complex manipulation tasks, especially for unseen object instances and categories. ### Summary The main goal of this paper is to develop a new robot manipulation strategy **AffordDP** by combining transferable manipulation prior knowledge (affordances) and diffusion models, so as to improve the generalization ability of the model on unseen object instances and categories. This not only solves the limitations of existing diffusion models in generalization, but also provides a more flexible and powerful solution for robot manipulation tasks.

AffordDP: Generalizable Diffusion Policy with Transferable Affordance

GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

Hierarchical Diffusion Policy: manipulation trajectory generation via contact guidance

Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation

One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation

Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

Dexterous Functional Pre-Grasp Manipulation with Diffusion Policy

Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation

Unpacking the Individual Components of Diffusion Policy

Diff-DAgger: Uncertainty Estimation with Diffusion Policy for Robotic Manipulation

RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation

PDP: Physics-Based Character Animation via Diffusion Policy

Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic Manipulation

DAP: Diffusion-based Affordance Prediction for Multi-modality Storage

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations