Motion synthesis via distilled absorbing discrete diffusion model

Zheng, Chao,Liu, Bangli
DOI: https://doi.org/10.1007/s00530-024-01492-9
IF: 3.9
2024-10-17
Multimedia Systems
Abstract:In this work, we explore the potential of discrete diffusion model in text-driven motion synthesis. Previous methods aimed at improving the quality of generated motions often led to an increase in model parameters, while neglecting the diversity of generated results. Here we introduce our Motion Absorbing Discrete Diffusion Model (MADDM), which combines the high diversity of continuous diffusion models with the high-quality generated results of discrete autoregressive models. Our results show that an absorbing discrete diffusion model can yield more precise discrete motion latent codes compared to previous autoregressive generation models. In MADDM, a lightweight discrete denoising model is designed to achieve more accurate generation results, which utilizes cross-layer parameter sharing to reduce the model's parameters. A reweighted distribution loss is utilized to distill the model to adapt the distillation process more effectively to the discrete diffusion model. Our approach achieves state-of-the-art result on HumanML3D dataset with FID 0.073 and the model parameters are only one-third of the previous discrete autoregressive model.
computer science, information systems, theory & methods
What problem does this paper attempt to address?