ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions

Anindita Ghosh,Rishabh Dabral,Vladislav Golyanik,Christian Theobalt,Philipp Slusallek
2024-07-29
Abstract:Current approaches for 3D human motion synthesis generate high quality animations of digital humans performing a wide variety of actions and gestures. However, a notable technological gap exists in addressing the complex dynamics of multi human interactions within this paradigm. In this work, we present ReMoS, a denoising diffusion based model that synthesizes full body reactive motion of a person in a two person interaction scenario. Given the motion of one person, we employ a combined spatio temporal cross attention mechanism to synthesize the reactive body and hand motion of the second person, thereby completing the interactions between the two. We demonstrate ReMoS across challenging two person scenarios such as pair dancing, Ninjutsu, kickboxing, and acrobatics, where one persons movements have complex and diverse influences on the other. We also contribute the ReMoCap dataset for two person interactions containing full body and finger motions. We evaluate ReMoS through multiple quantitative metrics, qualitative visualizations, and a user study, and also indicate usability in interactive motion editing applications.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to generate reactive motion in two - person interaction scenarios. Specifically, current 3D human motion synthesis techniques can generate high - quality digital human animations, covering various actions and gestures, but there are significant technological gaps in handling the complex dynamics of multi - person interactions. This paper proposes an approach based on a denoising diffusion - based model - ReMoS (Reactive Motion Synthesis), aiming to synthesize the full - body reactive motion of another person given the motion of one person, especially in two - person interaction scenarios such as couple dancing, ninjutsu, kickboxing and acrobatics, where one person's actions have complex and diverse impacts on the other. ReMoS achieves this goal by combining a spatio - temporal cross - attention mechanism, which can learn the action - dependency relationships between two interacting individuals without additional annotation data. In addition, ReMoS also introduces a hand - interaction - aware cross - attention mechanism to ensure that hand joints can respond appropriately to the actions of the other party, thereby enhancing the realism of the actions. To evaluate the effectiveness of ReMoS, the authors not only used multiple quantitative indicators and qualitative visualization methods for evaluation, but also conducted user studies and demonstrated its usability in interactive motion - editing applications. In short, the main contribution of this paper lies in providing a new framework that can generate the full - body and hand reactive motions of another interactant based solely on the 3D motion of one interactant without additional labels or text prompts, which is of great significance for the development of new capabilities in animation production tools and software.