GraspDiff: Grasping Generation for Hand-Object Interaction With Multimodal Guided Diffusion

Binghui Zuo,Zimeng Zhao,Wenqian Sun,Xiaohan Yuan,Zhipeng Yu,Yangang Wang
DOI: https://doi.org/10.1109/TVCG.2024.3466190
2024-09-23
Abstract:Grasping generation holds significant importance in both robotics and AI-generated content. While pure network paradigms based on VAEs or GANs ensure diversity in outcomes, they often fall short of achieving plausibility. Additionally, although those two-step paradigms that first predict contact and then optimize distance yield plausible results, they are always known to be time-consuming. This paper introduces a novel paradigm powered by DDPM, accommodating diverse modalities with varying interaction granularities as its generating conditions, including 3D object, contact affordance, and image content. Our key idea is that the iterative steps inherent to diffusion models can supplant the iterative optimization routines in existing optimization methods, thereby endowing the generated results from our method with both diversity and plausibility. Using the same training data, our paradigm achieves superior generation performance and competitive generation speed compared to optimization-based paradigms. Extensive experiments on both in-domain and out-of-domain objects demonstrate that our method receives significant improvement over the SOTA method. We will release the code for research purposes.
What problem does this paper attempt to address?