Real-time Animation Generation and Control on Rigged Models via Large Language Models

Han Huang,Fernanda De La Torre,Cathy Mengying Fang,Andrzej Banburski-Fahey,Judith Amores,Jaron Lanier
2024-02-16
Abstract:We introduce a novel method for real-time animation control and generation on rigged models using natural language input. First, we embed a large language model (LLM) in Unity to output structured texts that can be parsed into diverse and realistic animations. Second, we illustrate LLM's potential to enable flexible state transition between existing animations. We showcase the robustness of our approach through qualitative results on various rigged models and motions.
Graphics,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is how to achieve real-time animation generation and control using natural language input. Specifically, the authors propose an innovative method that embeds large language models (LLM) into Unity to achieve real-time animation generation and control of rigged models. This method not only generates new animations based on natural language descriptions but also flexibly transitions between existing animations. The paper demonstrates the application of this method on various models and actions, proving its robustness and flexibility. ### Main Contributions: 1. **Animation Generation**: By outputting structured text through large language models, these texts can be parsed into diverse and realistic animations. The position and rotation time series of each joint are encoded as structured strings and then parsed into animations. 2. **Animation Control**: Integrating large language models with Unity, this method achieves character animation state transitions by generating and executing appropriate Unity C# scripts. This approach allows seamless integration of pre-stored animations and custom game logic. ### Method Overview: - **Animation Generation**: Abstracting the 3D model as a tree structure, each joint has an associated motion time series. Large language models are used to generate structured text containing appropriate joint hierarchies and motions, optimized through keyframe compression and floating-point truncation. - **Animation Control**: By generating Unity C# scripts, animation state transitions are achieved. These scripts can be compiled and executed at runtime, allowing users to control animations through natural language commands. ### Application Scenarios: - **Animation Draft Generation**: Generating animation drafts for digital artists, similar to frameworks like Midjourney and Stable Diffusion. - **Game Development**: Achieving real-time animation generation and control, enhancing the efficiency and flexibility of game development. ### Conclusion: This method demonstrates its innovation and practicality in multiple aspects, especially in achieving real-time animation generation and control. By embedding large language models into Unity, this method not only generates high-quality animations but also flexibly transitions animation states, providing new tools and methods for animation production and game development.