Abstract:Human motion prediction, the task of predicting future 3D human poses given a sequence of observed ones, has been mostly treated as a deterministic problem. However, human motion is a stochastic process: Given an observed sequence of poses, multiple future motions are plausible. Existing approaches to modeling this stochasticity typically combine a random noise vector with information about the previous poses. This combination, however, is done in a deterministic manner, which gives the network the flexibility to learn to ignore the random noise. Alternatively, in this paper, we propose to stochastically combine the root of variations with previous pose information, so as to force the model to take the noise into account. We exploit this idea for motion prediction by incorporating it into a recurrent encoder-decoder network with a conditional variational autoencoder block that learns to exploit the perturbations. Our experiments on two large-scale motion prediction datasets demonstrate that our model yields high-quality pose sequences that are much more diverse than those from state-of-the-art stochastic motion prediction techniques.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the lack of diversity in human motion prediction. Specifically, most of the existing human motion prediction methods are deterministic, that is, given a series of observed motion postures, they can only predict one possible future motion sequence. However, human motion is essentially a stochastic process, and starting from the same set of observed postures, there can be multiple reasonable and different future motion sequences. Therefore, developing algorithms that can capture this diversity is crucial for achieving more accurate and natural human motion prediction. To overcome the limitations of existing methods, this paper proposes a new stochastic conditioning scheme - the Mix - and - Match perturbation method. This method randomly selects and perturbs a part of the conditional variables, forcing the model to consider random noise during the generation process, thereby improving the diversity and quality of the generated motion sequences. The main contributions of the paper include: 1. **Mix - and - Match perturbation**: A novel method for introducing diversity in the conditional variational auto - encoder (CVAE). 2. **New motion prediction model**: Capable of generating multiple possible future posture sequences from the observed motion. 3. **New evaluation metric**: Used to quantitatively measure the quality and diversity of the generated actions, facilitating comparison between different stochastic methods. 4. **Curriculum learning paradigm**: Used to train the generative model using Mix - and - Match perturbation as a stochastic conditioning scheme, achieving optimal performance even when introducing large changes. Through these innovations, the paper aims to improve the diversity and accuracy of human motion prediction, making it more in line with the complexity and uncertainty in the real world.

A Stochastic Conditioning Scheme for Diverse Human Motion Prediction

Forecasting Distillation: Enhancing 3D Human Motion Prediction with Guidance Regularization

Temporal Constrained Feasible Subspace Learning for Human Pose Forecasting

Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction

A Spatio-Temporal Transformer Network for Human Motion Prediction in Human-Robot Collaboration

Stochastic Multi-Person 3D Motion Forecasting

CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion

Spatial–temporal modeling for prediction of stylized human motion

Human Motion Prediction Based on Space-Time-Separable Graph Convolutional Network

Existence Is Chaos: Enhancing 3D Human Motion Prediction with Uncertainty Consideration

3D Human motion anticipation and classification

Human Motion Prediction Using Manifold-Aware Wasserstein GAN

DMS-GCN: Dynamic Mutiscale Spatiotemporal Graph Convolutional Networks for Human Motion Prediction

Diverse Human Motion Prediction via Gumbel-Softmax Sampling from an Auxiliary Space

Stacked residual blocks based encoder-decoder framework for human motion prediction

SLAMP: Stochastic Latent Appearance and Motion Prediction

Multitask Non-Autoregressive Model For Human Motion Prediction

A Mixture of Experts Approach to 3D Human Motion Prediction

3D Skeleton-based Human Motion Prediction with Manifold-Aware GAN

Velocity-to-velocity human motion forecasting

Towards Realistic 3D Human Motion Prediction with A Spatio-temporal Cross-transformer Approach