LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

Jianzhu Guo,Dingyun Zhang,Xiaoqiang Liu,Zhizhou Zhong,Yuan Zhang,Pengfei Wan,Di Zhang

2024-07-03

Abstract:Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability. Building upon this, we develop a video-driven portrait animation framework named LivePortrait with a focus on better generalization, controllability, and efficiency for practical usage. To enhance the generation quality and generalization ability, we scale up the training data to about 69 million high-quality frames, adopt a mixed image-video training strategy, upgrade the network architecture, and design better motion transformation and optimization objectives. Additionally, we discover that compact implicit keypoints can effectively represent a kind of blendshapes and meticulously propose a stitching and two retargeting modules, which utilize a small MLP with negligible computational overhead, to enhance the controllability. Experimental results demonstrate the efficacy of our framework even compared to diffusion-based methods. The generation speed remarkably reaches 12.8ms on an RTX 4090 GPU with PyTorch. The inference code and models are available at <a class="link-external link-https" href="https://github.com/KwaiVGI/LivePortrait" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the problem of animating static portraits, making them lifelike and highly expressive, while also pursuing efficient inference speed and precise controllability. Specifically, the goals of the paper include: 1. **Efficient Animation Generation**: Improve the quality and generalization ability of animation generation by enhancing the existing implicit keypoint framework. 2. **Enhanced Controllability**: Introduce a splicing module and two redirection modules to achieve precise control over eye and lip movements. 3. **Improved Inference Efficiency**: Ensure that the inference speed on high-performance hardware (such as RTX 4090 GPU) reaches 12.8 milliseconds. Compared to methods based on diffusion models, the proposed method in this paper not only matches in quality but also offers advantages in computational efficiency and controllability. Additionally, this method can handle various styles of portrait animation, including both real-life and anime-style images.

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

Portrait Video Editing Empowered by Multimodal Generative Priors

X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention

MyPortrait: Morphable Prior-Guided Personalized Portrait Generation

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation

GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression

Animating Portrait Line Drawings from a Single Face Photo and a Speech Signal

Deep video portraits

Real-Time Portrait Stylization on the Edge

VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

PuppeteerGAN: Arbitrary Portrait Animation With Semantic-Aware Appearance Transformation

Real-time One-Step Diffusion-based Expressive Portrait Videos Generation

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks

Recapture as You Want

PVP: Personalized Video Prior for Editable Dynamic Portraits using StyleGAN

PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization

Parametric Reshaping of Portraits in Videos

DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis