Abstract:Learning natural and diverse behaviors from human motion datasets remains challenging in physics-based character control. Existing conditional adversarial models often suffer from tight and biased embedding distributions where embeddings from the same motion are closely grouped in a small area and shorter motions occupy even less space. Our empirical observations indicate this limits the representational capacity and diversity under each skill. An ideal latent space should be maximally packed by all motion's embedding clusters. In this paper, we propose a skill-conditioned controller that learns diverse skills with expressive variations. Our approach leverages the Neural Collapse phenomenon, a natural outcome of the classification-based encoder, to uniformly distributed cluster centers. We additionally propose a novel Embedding Expansion technique to form stylistic embedding clusters for diverse skills that are uniformly distributed on a hypersphere, maximizing the representational area occupied by each skill and minimizing unmapped regions. This maximally packed and uniformly distributed embedding space ensures that embeddings within the same cluster generate behaviors conforming to the characteristics of the corresponding motion clips, yet exhibiting noticeable variations within each cluster. Compared to existing methods, our controller not only generates high-quality, diverse motions covering the entire dataset but also achieves superior controllability, motion coverage, and diversity under each skill. Both qualitative and quantitative results confirm these traits, enabling our controller to be applied to a wide range of downstream tasks and serving as a cornerstone for diverse applications.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the challenge of learning natural and diverse behaviors from human motion datasets in physics - based character control. Specifically: 1. **Limitations of existing methods**: - **Conditional Adversarial Models**: These models usually encounter the problem of tight and biased embedding distributions, that is, different embedding points from the same motion are clustered in a small area in the latent space, and shorter motions occupy less space. This limits the expressiveness and diversity of each skill. - **Exploration burden**: Some methods alleviate the above problems by constructing independent embedding spaces for each motion, but this method introduces a mixed discrete - continuous embedding space, increasing the learning difficulty and exploration burden of high - level strategies. 2. **Ideal goals**: - **Maximize representational ability**: The ideal latent space should be able to evenly distribute the embedding clusters of all motions, ensure that each skill occupies a maximally representative area in the latent space, and reduce unmapped areas. - **Improve diversity and controllability**: The generated motions should not only conform to the characteristics of the corresponding motion segments but also show significant changes within each cluster, thereby improving the diversity and controllability of the controller. 3. **Solutions**: - **Classification encoder and Neural Collapse Phenomenon**: Use the encoder in classification tasks to evenly distribute the features of different motions on a high - dimensional sphere. The Neural Collapse Phenomenon makes the features of each category tend to converge to a single point, forming equally spaced class centers. - **Embedding Expansion**: By expanding the embedding area around each skill cluster, ensure that the embedding points within each skill cluster can generate diverse behaviors while maintaining style consistency. - **Interval Motion - Progress Encoding**: In order to ensure the generation of complete action sequences, especially in complex actions such as combos, motion - progress information is introduced to guide the controller to generate complete and meaningful action sequences. In summary, this paper proposes a new skill - conditional controller, which realizes high - quality, diverse, and controllable motion synthesis through optimizing latent - space distribution and embedding - expansion techniques, and is suitable for a wide range of application scenarios.

Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters

C$\cdot$ASE: Learning Conditional Adversarial Skill Embeddings for Physics-based Characters

ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters

Neural Categorical Priors for Physics-Based Character Control

Universal Humanoid Motion Representations for Physics-Based Control

Strategy and Skill Learning for Physics-based Table Tennis Animation

ControlVAE: Model-Based Learning of Generative Controllers for Physics-Based Characters

SFV: Reinforcement Learning of Physical Skills from Videos

Guided Learning of Control Graphs for Physics-Based Characters

Taming Diffusion Probabilistic Models for Character Control

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

Perpetual Humanoid Control for Real-time Simulated Avatars

MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting

VMP: Versatile Motion Priors for Robustly Tracking Motion on Physical Characters

CALM: Conditional Adversarial Latent Models for Directable Virtual Characters

Learning Physically Simulated Tennis Skills from Broadcast Videos

DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills

From Universal Humanoid Control to Automatic Physically Valid Character Creation

Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning

Supervised Learning of Motion Style for Real-time Synthesis of 3D Character Animations

Learning and Exploring Motor Skills with Spacetime Bounds