AffordDP: Generalizable Diffusion Policy with Transferable Affordance

Shijie Wu,Yihang Zhu,Yunao Huang,Kaizhen Zhu,Jiayuan Gu,Jingyi Yu,Ye Shi,Jingya Wang
2024-12-04
Abstract:Diffusion-based policies have shown impressive performance in robotic manipulation tasks while struggling with out-of-domain distributions. Recent efforts attempted to enhance generalization by improving the visual feature encoding for diffusion policy. However, their generalization is typically limited to the same category with similar appearances. Our key insight is that leveraging affordances--manipulation priors that define "where" and "how" an agent interacts with an object--can substantially enhance generalization to entirely unseen object instances and categories. We introduce the Diffusion Policy with transferable Affordance (AffordDP), designed for generalizable manipulation across novel categories. AffordDP models affordances through 3D contact points and post-contact trajectories, capturing the essential static and dynamic information for complex tasks. The transferable affordance from in-domain data to unseen objects is achieved by estimating a 6D transformation matrix using foundational vision models and point cloud registration techniques. More importantly, we incorporate affordance guidance during diffusion sampling that can refine action sequence generation. This guidance directs the generated action to gradually move towards the desired manipulation for unseen objects while keeping the generated action within the manifold of action space. Experimental results from both simulated and real-world environments demonstrate that AffordDP consistently outperforms previous diffusion-based methods, successfully generalizing to unseen instances and categories where others fail.
Robotics
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem that existing imitation learning methods based on diffusion models are difficult to generalize to unseen object instances and categories (out - of - domain distributions) when dealing with robot manipulation tasks. Specifically: 1. **Limitations of existing methods**: - Although diffusion - model - based policies perform well in robot manipulation tasks, they perform poorly when dealing with tasks outside the distribution range of training data. - Existing methods mainly rely on the improvement of visual feature encoding, and these improvements can usually only be generalized to the same - class objects with similar appearances, and cannot effectively handle objects of completely different classes or shapes. 2. **Introducing transferable manipulation prior knowledge (Affordance)**: - The author proposes to use manipulation prior knowledge (affordances), that is, the knowledge that defines "where" and "how" to interact with objects, to enhance the generalization ability of the model. - This manipulation prior knowledge can help the model better understand the interaction modes of different objects, so as to achieve effective manipulation of unseen object instances and categories. 3. **Specific problem description**: - How to effectively transfer the manipulation prior knowledge of known objects to unseen objects? - How to ensure that the generated action sequences not only conform to the overall action distribution, but also meet the requirements of specific tasks, especially in high - precision tasks (such as grasping the doorknob to open the door)? By introducing **AffordDP** (Diffusion Policy with Transferable Manipulation Prior Knowledge), the author hopes to solve the above problems and achieve effective generalization for complex manipulation tasks, especially for unseen object instances and categories. ### Summary The main goal of this paper is to develop a new robot manipulation strategy **AffordDP** by combining transferable manipulation prior knowledge (affordances) and diffusion models, so as to improve the generalization ability of the model on unseen object instances and categories. This not only solves the limitations of existing diffusion models in generalization, but also provides a more flexible and powerful solution for robot manipulation tasks.