Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters

Nian Liu,Libin Liu,Zilong Zhang,Zi Wang,Hongzhao Xie,Tengyu Liu,Xinyi Tong,Yaodong Yang,Zhaofeng He
2024-11-10
Abstract:Learning natural and diverse behaviors from human motion datasets remains challenging in physics-based character control. Existing conditional adversarial models often suffer from tight and biased embedding distributions where embeddings from the same motion are closely grouped in a small area and shorter motions occupy even less space. Our empirical observations indicate this limits the representational capacity and diversity under each skill. An ideal latent space should be maximally packed by all motion's embedding clusters. In this paper, we propose a skill-conditioned controller that learns diverse skills with expressive variations. Our approach leverages the Neural Collapse phenomenon, a natural outcome of the classification-based encoder, to uniformly distributed cluster centers. We additionally propose a novel Embedding Expansion technique to form stylistic embedding clusters for diverse skills that are uniformly distributed on a hypersphere, maximizing the representational area occupied by each skill and minimizing unmapped regions. This maximally packed and uniformly distributed embedding space ensures that embeddings within the same cluster generate behaviors conforming to the characteristics of the corresponding motion clips, yet exhibiting noticeable variations within each cluster. Compared to existing methods, our controller not only generates high-quality, diverse motions covering the entire dataset but also achieves superior controllability, motion coverage, and diversity under each skill. Both qualitative and quantitative results confirm these traits, enabling our controller to be applied to a wide range of downstream tasks and serving as a cornerstone for diverse applications.
Graphics
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the challenge of learning natural and diverse behaviors from human motion datasets in physics - based character control. Specifically: 1. **Limitations of existing methods**: - **Conditional Adversarial Models**: These models usually encounter the problem of tight and biased embedding distributions, that is, different embedding points from the same motion are clustered in a small area in the latent space, and shorter motions occupy less space. This limits the expressiveness and diversity of each skill. - **Exploration burden**: Some methods alleviate the above problems by constructing independent embedding spaces for each motion, but this method introduces a mixed discrete - continuous embedding space, increasing the learning difficulty and exploration burden of high - level strategies. 2. **Ideal goals**: - **Maximize representational ability**: The ideal latent space should be able to evenly distribute the embedding clusters of all motions, ensure that each skill occupies a maximally representative area in the latent space, and reduce unmapped areas. - **Improve diversity and controllability**: The generated motions should not only conform to the characteristics of the corresponding motion segments but also show significant changes within each cluster, thereby improving the diversity and controllability of the controller. 3. **Solutions**: - **Classification encoder and Neural Collapse Phenomenon**: Use the encoder in classification tasks to evenly distribute the features of different motions on a high - dimensional sphere. The Neural Collapse Phenomenon makes the features of each category tend to converge to a single point, forming equally spaced class centers. - **Embedding Expansion**: By expanding the embedding area around each skill cluster, ensure that the embedding points within each skill cluster can generate diverse behaviors while maintaining style consistency. - **Interval Motion - Progress Encoding**: In order to ensure the generation of complete action sequences, especially in complex actions such as combos, motion - progress information is introduced to guide the controller to generate complete and meaningful action sequences. In summary, this paper proposes a new skill - conditional controller, which realizes high - quality, diverse, and controllable motion synthesis through optimizing latent - space distribution and embedding - expansion techniques, and is suitable for a wide range of application scenarios.