Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models

Wenjie Yin,Ruibo Tu,Hang Yin,Danica Kragic,Hedvig Kjellström,Mårten Björkman

2023-04-03

Abstract:Data-driven and controllable human motion synthesis and prediction are active research areas with various applications in interactive media and social robotics. Challenges remain in these fields for generating diverse motions given past observations and dealing with imperfect poses. This paper introduces MoDiff, an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities. Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities. We also introduce a new data dropout method based on the diffusion forward process to provide richer data representations and robust generation. We demonstrate the superior performance of MoDiff in controllable motion synthesis for locomotion with respect to two baselines and show the benefits of diffusion data dropout for robust synthesis and reconstruction of high-fidelity motion close to recorded data.

Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

The paper attempts to address the problem of generating diverse and controllable human motions given past observational data, and handling imperfect poses. Specifically, the researchers propose a method based on an autoregressive diffusion model (MoDiff), aimed at improving the quality and robustness of human motion synthesis and prediction. Traditional deterministic models perform poorly in generating diverse actions, while probabilistic models, although capable of generating more diverse actions, still face challenges in handling long-term generation and imperfect data. MoDiff introduces a cross-modal Transformer encoder and a Transformer-based decoder to capture temporal correlations and control modalities, while also proposing a novel data dropout method to provide richer data representations and robust generation. The paper demonstrates MoDiff's superior performance in controllable motion synthesis and reconstruction. Summary: - **Problem**: Generating diverse and controllable human motions, handling imperfect poses. - **Method**: Proposes MoDiff, a framework combining autoregressive diffusion model and cross-modal Transformer. - **Innovation**: Introduces a data dropout method to enhance model robustness and diversity. - **Applications**: Applicable to fields such as interactive media and social robots.

Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

Interactive Character Control with Auto-Regressive Motion Diffusion Models

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation

MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion

MotionDiffuse: Text-Driven Human Motion Generation With Diffusion Model

Human Motion Diffusion Model

Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models

Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction

BoDiffusion: Diffusing Sparse Observations for Full-Body Human Motion Synthesis

DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control

MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis

Taming Diffusion Probabilistic Models for Character Control

TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

DivDiff: A Conditional Diffusion Model for Diverse Human Motion Prediction

DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model

AAMDM: Accelerated Auto-regressive Motion Diffusion Model

AMD: Autoregressive Motion Diffusion

MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation