Abstract:Close and continuous interaction with rich contacts is a crucial aspect of human activities (e.g. hugging, dancing) and of interest in many domains like activity recognition, motion prediction, character animation, etc. However, acquiring such skeletal motion is challenging. While direct motion capture is expensive and slow, motion editing/generation is also non-trivial, as complex contact patterns with topological and geometric constraints have to be retained. To this end, we propose a new deep learning method for two-body skeletal interaction motion augmentation, which can generate variations of contact-rich interactions with varying body sizes and proportions while retaining the key geometric/topological relations between two bodies. Our system can learn effectively from a relatively small amount of data and generalize to drastically different skeleton sizes. Through exhaustive evaluation and comparison, we show it can generate high-quality motions, has strong generalizability and outperforms traditional optimization-based methods and alternative deep learning solutions.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to generate high - quality two - person skeletal interaction motion data, especially in cases involving close and continuous physical contact (such as hugging, wrestling, dancing, etc.). Specifically, the author aims to address the following challenges: 1. **Cost and Difficulty of High - Quality Motion Capture**: Traditional direct motion capture methods are costly and time - consuming, while data obtained using low - cost devices (such as RGB - D cameras) usually has jitter and tracking errors. 2. **Limitations of Existing Datasets**: Most of the existing skeletal motion datasets are based on simple, short - lived, and almost non - contact interactions between a single person or multiple people, lacking complex and continuous interaction data. 3. **Limitations of Traditional Optimization Methods**: Traditional optimization methods require a large amount of manual adjustment when dealing with complex geometric/topological constraints, and are slow and difficult to generate a large amount of data. 4. **Deficiencies of Deep Learning Methods**: Although existing deep learning methods have been successful in single - person motion redirection, they cannot be directly extended to two - person interactions because they do not model the geometric constraints between characters. To solve these problems, the author proposes a new deep learning method for the augmentation of two - person skeletal interaction motions. This method can generate a variety of variable motions according to different skeletal sizes and proportions while retaining key geometric/topological relationships. Specific contributions include: - Proposing a new probability decomposition method for two - person interaction motions, enabling the model to effectively learn from limited data. - Developing a new deep learning framework that can learn from a small number of training samples and generalize to two - person interaction motions of different body types. - Creating a new dataset containing two - person interaction motions of different body types and proportions. Through these improvements, the author's method can not only generate high - quality motions but also significantly improve the generalization ability, which is suitable for various downstream tasks, such as motion prediction and activity recognition.

Two-Person Interaction Augmentation with Skeleton Priors

Inferring Object Properties from Human Interaction and Transferring Them to New Motions

Learning a Deep Motion Interpolation Network for Human Skeleton Animations

Full-body Motion Capture for Multiple Closely Interacting Persons.

Capturing Closely Interacted Two-Person Motions with Reaction Priors

Recognition and Detection of Two-Person Interactive Actions Using Automatically Selected Skeleton Features

Automatic Human Scene Interaction through Contact Estimation and Motion Adaptation

Skeleton2Humanoid: Animating Simulated Characters for Physically-plausible Motion In-betweening

A Sampling Approach to Generating Closely Interacting 3D Pose-Pairs from 2D Annotations

Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics

Shape and Pose Estimation for Closely Interacting Persons Using Multi-view Images.

Simulation and Retargeting of Complex Multi-Character Interactions

Multi-Granularity Interaction for Multi-Person 3D Motion Prediction

Skeleton-Based Mutually Assisted Interacted Object Localization and Human Action Recognition

Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption

Reconstructing Close Human Interactions from Multiple Views

Two-person Graph Convolutional Network for Skeleton-based Human Interaction Recognition.

Contact and Human Dynamics from Monocular Video

It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model

Skeleton-Aware Networks for Deep Motion Retargeting

AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos