Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models

Huaijin Pi,Sida Peng,Minghui Yang,Xiaowei Zhou,Hujun Bao
2023-10-04
Abstract:This paper presents a novel approach to generating the 3D motion of a human interacting with a target object, with a focus on solving the challenge of synthesizing long-range and diverse motions, which could not be fulfilled by existing auto-regressive models or path planning-based methods. We propose a hierarchical generation framework to solve this challenge. Specifically, our framework first generates a set of milestones and then synthesizes the motion along them. Therefore, the long-range motion generation could be reduced to synthesizing several short motion sequences guided by milestones. The experiments on the NSM, COUCH, and SAMP datasets show that our approach outperforms previous methods by a large margin in both quality and diversity. The source code is available on our project page <a class="link-external link-https" href="https://zju3dv.github.io/hghoi" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The paper primarily focuses on addressing the problem of synthesizing 3D human-object interaction motions given a 3D scene model. Specifically, the research emphasizes generating long-term and diverse motion sequences, which are challenging for existing autoregressive models or path planning methods to achieve. To overcome these challenges, the paper proposes a hierarchical generation framework. This framework first predicts a series of "milestones," each representing a partial human pose and its position in the overall motion trajectory, and then generates specific motion sequences based on these milestones. This approach simplifies the problem of generating long-term motions into the synthesis of multiple short-term motion sequences, thereby avoiding the issue of accumulated errors in the autoregressive process and improving the quality and diversity of the generated motions. Additionally, the research leverages diffusion probabilistic models to further enhance the quality of the generated motion sequences. In this way, the paper addresses two key issues: how to generate coherent and natural human motions, and how to ensure that the generated motion sequences are richly diverse. Experimental results show that the proposed method significantly outperforms previous methods on multiple datasets, with substantial improvements in both motion quality and diversity.