LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

Jianzhu Guo,Dingyun Zhang,Xiaoqiang Liu,Zhizhou Zhong,Yuan Zhang,Pengfei Wan,Di Zhang
2024-07-03
Abstract:Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability. Building upon this, we develop a video-driven portrait animation framework named LivePortrait with a focus on better generalization, controllability, and efficiency for practical usage. To enhance the generation quality and generalization ability, we scale up the training data to about 69 million high-quality frames, adopt a mixed image-video training strategy, upgrade the network architecture, and design better motion transformation and optimization objectives. Additionally, we discover that compact implicit keypoints can effectively represent a kind of blendshapes and meticulously propose a stitching and two retargeting modules, which utilize a small MLP with negligible computational overhead, to enhance the controllability. Experimental results demonstrate the efficacy of our framework even compared to diffusion-based methods. The generation speed remarkably reaches 12.8ms on an RTX 4090 GPU with PyTorch. The inference code and models are available at <a class="link-external link-https" href="https://github.com/KwaiVGI/LivePortrait" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the problem of animating static portraits, making them lifelike and highly expressive, while also pursuing efficient inference speed and precise controllability. Specifically, the goals of the paper include: 1. **Efficient Animation Generation**: Improve the quality and generalization ability of animation generation by enhancing the existing implicit keypoint framework. 2. **Enhanced Controllability**: Introduce a splicing module and two redirection modules to achieve precise control over eye and lip movements. 3. **Improved Inference Efficiency**: Ensure that the inference speed on high-performance hardware (such as RTX 4090 GPU) reaches 12.8 milliseconds. Compared to methods based on diffusion models, the proposed method in this paper not only matches in quality but also offers advantages in computational efficiency and controllability. Additionally, this method can handle various styles of portrait animation, including both real-life and anime-style images.