Abstract:While motion generation has made substantial progress, its practical application remains constrained by dataset diversity and scale, limiting its ability to handle out-of-distribution scenarios. To address this, we propose a simple and effective baseline, RMD, which enhances the generalization of motion generation through retrieval-augmented techniques. Unlike previous retrieval-based methods, RMD requires no additional training and offers three key advantages: (1) the external retrieval database can be flexibly replaced; (2) body parts from the motion database can be reused, with an LLM facilitating splitting and recombination; and (3) a pre-trained motion diffusion model serves as a prior to improve the quality of motions obtained through retrieval and direct combination. Without any training, RMD achieves state-of-the-art performance, with notable advantages on out-of-distribution data.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the generalization problem in **human motion generation**, especially the challenges encountered when dealing with **out - of - distribution (OOD) motion generation tasks**. Specifically, existing methods perform poorly when dealing with OOD text inputs, mainly facing the following two problems: 1. **Combinatorial complexity**: The combinatorial complexity of human actions makes it difficult for the training set to cover all possible full - body actions. 2. **Description diversity**: Diverse action descriptions lead to a persistent gap between test and training prompts. To solve these problems, the authors propose a simple and effective baseline method - **RMD (Retrieval - augmented Motion Diffuse)**. This method enhances the generalization ability of action generation in the following ways: - **No additional training required**: RMD is a retrieval - enhanced method without training. - **Hierarchical retrieval strategy**: Adopt a hierarchical retrieval strategy, decomposing actions into different levels (full - body, half - body, fine - grained) to bridge the above - mentioned gap. - **Pretrained diffusion model**: Use a pretrained diffusion model to optimize the quality of synthesized actions, improving action coordination and generation diversity. ### Method overview The workflow of RMD is divided into two main stages: 1. **Action retrieval stage**: - Use a customizable external action database to select and combine relevant actions according to the text prompt input by the user. - Flexibly retrieve action segments from the database through a multi - level retrieval pipeline (full - body, half - body, fine - grained) and recombine them into complete actions. 2. **Action diffusion stage**: - Use a pretrained action diffusion model to refine the combined actions to improve action quality and diversity. - Through a noise - and - denoise scheme, first add noise to the retrieved actions during the diffusion stage, and then use the pretrained model to remove the noise under the guidance of the input text, thereby optimizing the action quality. ### Experimental results Experiments show that RMD achieves state - of - the - art performance on both the standard benchmark HumanML3D and the cross - domain dataset Mixamo, especially when dealing with OOD data. In addition, user studies also show that RMD has significant advantages when dealing with OOD text prompts and can generate more realistic actions that match the text descriptions better. ### Summary RMD provides a lightweight, easy - to - implement, and high - performance method, systematically evaluates the performance of existing methods in OOD scenarios, and sets a new benchmark for future general - purpose action generation.

RMD: A Simple Baseline for More General Human Motion Generation via Training-free Retrieval-Augmented Motion Diffuse

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation

Human Motion Diffusion Model

DiverseMotion: Towards Diverse Human Motion Generation Via Discrete Diffusion

AMD: Autoregressive Motion Diffusion

FG-MDM: Towards Zero-Shot Human Motion Generation via ChatGPT-Refined Descriptions

Efficient Text-driven Motion Generation via Latent Consistency Training

Executing Your Commands Via Motion Diffusion in Latent Space.

Motion Mamba: Efficient and Long Sequence Motion Generation

MDMP: Multi-modal Diffusion for supervised Motion Predictions with uncertainty

Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model

Motion Diffusion-Guided 3D Global HMR from a Dynamic Camera

Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models

Generative Model-Enhanced Human Motion Prediction

AMG: Avatar Motion Guided Video Generation

Large Motion Model for Unified Multi-Modal Motion Generation

Lifting Motion to the 3D World via 2D Diffusion

MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion

Human Motion Diffusion as a Generative Prior

Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models