A Stochastic Conditioning Scheme for Diverse Human Motion Prediction

Sadegh Aliakbarian,Fatemeh Sadat Saleh,Mathieu Salzmann,Lars Petersson,Stephen Gould
DOI: https://doi.org/10.1109/cvpr42600.2020.00527
2020-06-01
Abstract:Human motion prediction, the task of predicting future 3D human poses given a sequence of observed ones, has been mostly treated as a deterministic problem. However, human motion is a stochastic process: Given an observed sequence of poses, multiple future motions are plausible. Existing approaches to modeling this stochasticity typically combine a random noise vector with information about the previous poses. This combination, however, is done in a deterministic manner, which gives the network the flexibility to learn to ignore the random noise. Alternatively, in this paper, we propose to stochastically combine the root of variations with previous pose information, so as to force the model to take the noise into account. We exploit this idea for motion prediction by incorporating it into a recurrent encoder-decoder network with a conditional variational autoencoder block that learns to exploit the perturbations. Our experiments on two large-scale motion prediction datasets demonstrate that our model yields high-quality pose sequences that are much more diverse than those from state-of-the-art stochastic motion prediction techniques.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of diversity in human motion prediction. Specifically, most of the existing human motion prediction methods are deterministic, that is, given a series of observed motion postures, they can only predict one possible future motion sequence. However, human motion is essentially a stochastic process, and starting from the same set of observed postures, there can be multiple reasonable and different future motion sequences. Therefore, developing algorithms that can capture this diversity is crucial for achieving more accurate and natural human motion prediction. To overcome the limitations of existing methods, this paper proposes a new stochastic conditioning scheme - the Mix - and - Match perturbation method. This method randomly selects and perturbs a part of the conditional variables, forcing the model to consider random noise during the generation process, thereby improving the diversity and quality of the generated motion sequences. The main contributions of the paper include: 1. **Mix - and - Match perturbation**: A novel method for introducing diversity in the conditional variational auto - encoder (CVAE). 2. **New motion prediction model**: Capable of generating multiple possible future posture sequences from the observed motion. 3. **New evaluation metric**: Used to quantitatively measure the quality and diversity of the generated actions, facilitating comparison between different stochastic methods. 4. **Curriculum learning paradigm**: Used to train the generative model using Mix - and - Match perturbation as a stochastic conditioning scheme, achieving optimal performance even when introducing large changes. Through these innovations, the paper aims to improve the diversity and accuracy of human motion prediction, making it more in line with the complexity and uncertainty in the real world.