Abstract:In the realm of stochastic human motion prediction (SHMP), researchers have often turned to generative models like GANS, VAEs and diffusion models. However, most previous approaches have struggled to accurately predict motions that are both realistic and coherent with past motion due to a lack of guidance on the latent distribution. In this paper, we introduce Semantic Latent Directions (SLD) as a solution to this challenge, aiming to constrain the latent space to learn meaningful motion semantics and enhance the accuracy of SHMP. SLD defines a series of orthogonal latent directions and represents the hypothesis of future motion as a linear combination of these directions. By creating such an information bottleneck, SLD excels in capturing meaningful motion semantics, thereby improving the precision of motion predictions. Moreover, SLD offers controllable prediction capabilities by adjusting the coefficients of the latent directions during the inference phase. Expanding on SLD, we introduce a set of motion queries to enhance the diversity of predictions. By aligning these motion queries with the SLD space, SLD is further promoted to more accurate and coherent motion predictions. Through extensive experiments conducted on widely used benchmarks, we showcase the superiority of our method in accurately predicting motions while maintaining a balance of realism and diversity. Our code and pretrained models are available at <a class="link-external link-https" href="https://github.com/GuoweiXu368/SLD-HMP" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper attempts to address the problem of how to generate accurate and controllable future motion predictions that are both realistic and consistent with past movements in Stochastic Human Motion Prediction (SHMP). Existing generative models such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and diffusion models, although capable of generating diverse motion predictions, often produce unrealistic or inconsistent motions due to the lack of effective guidance on the latent distribution. Specifically, these methods struggle to accurately learn meaningful human motion representations when capturing future motion distributions, which limits their prediction accuracy. To address these issues, the paper proposes a new method called "Semantic Latent Directions" (SLD). SLD constructs a series of orthogonal latent directions and represents future motion as a linear combination of these directions, thereby learning meaningful motion semantics in the latent space. This approach not only improves the accuracy of motion predictions but also provides the ability to achieve controllable predictions by adjusting the coefficients of the latent directions. Additionally, SLD introduces a set of learnable motion queries, further enhancing the diversity and accuracy of the predictions. In summary, the main contributions of the paper include: 1. Pointing out that the latent motion space in existing generative frameworks lacks necessary constraints, making it difficult to effectively learn meaningful human motion representations. 2. Introducing a new method—Semantic Latent Directions (SLD)—which constructs a latent semantic motion space to achieve accurate and controllable human motion predictions. 3. Demonstrating the superior performance of the method in the task of stochastic human motion prediction through extensive experiments on widely used benchmark datasets.

Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction

Forecasting Distillation: Enhancing 3D Human Motion Prediction with Guidance Regularization

Temporal Constrained Feasible Subspace Learning for Human Pose Forecasting

Towards Practical Human Motion Prediction with LiDAR Point Clouds

Human Motion Prediction Using Manifold-Aware Wasserstein GAN

Executing Your Commands Via Motion Diffusion in Latent Space

Diverse Human Motion Prediction via Gumbel-Softmax Sampling from an Auxiliary Space

Aggregated Multi-GANs for Controlled 3D Human Motion Prediction

Spatio-temporal Manifold Learning for Human Motions via Long-horizon Modeling

DiverseMotion: Towards Diverse Human Motion Generation Via Discrete Diffusion

Multi-Condition Latent Diffusion Network for Scene-Aware Neural Human Motion Prediction

CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion

3D Skeleton-based Human Motion Prediction with Manifold-Aware GAN

Past Movements-Guided Motion Representation Learning for Human Motion Prediction

Generative Model-Enhanced Human Motion Prediction

DLow: Diversifying Latent Flows for Diverse Human Motion Prediction

Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction

3D Human motion anticipation and classification

Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction

Long-Term Human Motion Prediction with Scene Context

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance