MDMP: Multi-modal Diffusion for supervised Motion Predictions with uncertainty

Leo Bringer,Joey Wilson,Kira Barton,Maani Ghaffari

2024-10-05

Abstract:This paper introduces a Multi-modal Diffusion model for Motion Prediction (MDMP) that integrates and synchronizes skeletal data and textual descriptions of actions to generate refined long-term motion predictions with quantifiable uncertainty. Existing methods for motion forecasting or motion generation rely solely on either prior motions or text prompts, facing limitations with precision or control, particularly over extended durations. The multi-modal nature of our approach enhances the contextual understanding of human motion, while our graph-based transformer framework effectively capture both spatial and temporal motion dynamics. As a result, our model consistently outperforms existing generative techniques in accurately predicting long-term motions. Additionally, by leveraging diffusion models' ability to capture different modes of prediction, we estimate uncertainty, significantly improving spatial awareness in human-robot interactions by incorporating zones of presence with varying confidence levels for each body joint.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to generate long - term human motion predictions that are accurate and with quantifiable uncertainty in human - robot collaboration (HRC). Specifically, existing human motion prediction or generation methods mainly rely on a single data source, such as using only past motion data or only using text prompts, which results in limited precision or control ability in long - term predictions. These problems are particularly prominent in dynamic collaborative scenarios that require precise interaction tasks, collision avoidance, and efficient trajectory planning. To solve the above problems, the paper proposes a multi - modal diffusion model (Multi - modal Diffusion Model for Motion Prediction, MDMP), which combines and synchronizes skeletal data and text descriptions of actions to generate more refined long - term motion predictions and can estimate uncertainty. Through this method, MDMP not only improves the contextual understanding of human motion, but also its graph - based Transformer framework effectively captures the motion dynamics in space and time, thus significantly outperforming existing generation techniques in accurately predicting long - term motion. In addition, by leveraging the ability of the diffusion model to capture different prediction modes, MDMP can also estimate uncertainty, further enhancing the spatial awareness in human - machine interaction, especially in the presence areas of different confidence levels around each body joint.

MDMP: Multi-modal Diffusion for supervised Motion Predictions with uncertainty

Forecasting Distillation: Enhancing 3D Human Motion Prediction with Guidance Regularization

MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion

Human Motion Diffusion Model

Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction

Text-driven Human Motion Generation with Motion Masked Diffusion Model

ADM: Accelerated Diffusion Model via Estimated Priors for Robust Motion Prediction under Uncertainties

DivDiff: A Conditional Diffusion Model for Diverse Human Motion Prediction

DMMGAN: Diverse Multi Motion Prediction of 3D Human Joints using Attention-Based Generative Adverserial Network

Multi-Condition Latent Diffusion Network for Scene-Aware Neural Human Motion Prediction

Enhanced Multimodal Trajectory Prediction for Autonomous Vehicles Using Advanced Diffusion Model Techniques

Stochastic Multi-Person 3D Motion Forecasting

Future Motion Dynamic Modeling Via Hybrid Supervision for Multi-Person Motion Prediction Uncertainty Reduction

MMM: Generative Masked Motion Model

Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models

M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models

Flexible Motion In-betweening with Diffusion Models

CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion

TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

Executing Your Commands Via Motion Diffusion in Latent Space