AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Huawei Wei,Zejun Yang,Zhisheng Wang
2024-03-26
Abstract:In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image. Our methodology is divided into two stages. Initially, we extract 3D intermediate representations from audio and project them into a sequence of 2D facial landmarks. Subsequently, we employ a robust diffusion model, coupled with a motion module, to convert the landmark sequence into photorealistic and temporally consistent portrait animation. Experimental results demonstrate the superiority of AniPortrait in terms of facial naturalness, pose diversity, and visual quality, thereby offering an enhanced perceptual experience. Moreover, our methodology exhibits considerable potential in terms of flexibility and controllability, which can be effectively applied in areas such as facial motion editing or face reenactment. We release code and model weights at
Computer Vision and Pattern Recognition,Graphics,Image and Video Processing
What problem does this paper attempt to address?
The paper proposes a solution to the problem of generating realistic portrait animation driven by audio. Existing methods face challenges in creating high-quality and visually coherent animations because it requires coordinating lip movements, facial expressions, and head poses. The AniPortrait framework consists of two stages: first, extracting 3D facial mesh and head poses from the audio and then converting them into 2D facial landmarks. Second, using the diffusion model to transform the landmark sequence into temporally consistent and realistic portrait animation. This approach excels in facial naturalness, pose diversity, and visual quality. It also offers flexibility and controllability, making it suitable for facial motion editing and facial re-enactment.