InvertAvatar: Incremental GAN Inversion for Generalized Head Avatars

Xiaochen Zhao,Jingxiang Sun,Lizhen Wang,Jinli Suo,Yebin Liu

2024-05-27

Abstract:While high fidelity and efficiency are central to the creation of digital head avatars, recent methods relying on 2D or 3D generative models often experience limitations such as shape distortion, expression inaccuracy, and identity flickering. Additionally, existing one-shot inversion techniques fail to fully leverage multiple input images for detailed feature extraction. We propose a novel framework, \textbf{Incremental 3D GAN Inversion}, that enhances avatar reconstruction performance using an algorithm designed to increase the fidelity from multiple frames, resulting in improved reconstruction quality proportional to frame count. Our method introduces a unique animatable 3D GAN prior with two crucial modifications for enhanced expression controllability alongside an innovative neural texture encoder that categorizes texture feature spaces based on UV parameterization. Differentiating from traditional techniques, our architecture emphasizes pixel-aligned image-to-image translation, mitigating the need to learn correspondences between observation and canonical spaces. Furthermore, we incorporate ConvGRU-based recurrent networks for temporal data aggregation from multiple frames, boosting geometry and texture detail reconstruction. The proposed paradigm demonstrates state-of-the-art performance on one-shot and few-shot avatar animation tasks. Code will be available at <a class="link-external link-https" href="https://github.com/XChenZ/invertAvatar" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address several key issues in digital head avatar creation, particularly in terms of high fidelity and efficiency. Specifically: 1. **Shape Distortion**: Existing 2D generation models produce shape distortions when handling large movements due to the lack of geometric constraints. 2. **Inaccurate Expressions and Identity Flickering**: Methods based on 3D GANs often result in inaccurate expressions and identity flickering during animation because motion and appearance are naturally entangled in the latent space. 3. **Single Image Limitation**: Current one-shot inversion techniques rely on a single source image, which is insufficient to fully represent the subject, as a single image may contain occlusions and limited pose information. To address these issues, the authors propose a new framework—Incremental 3D GAN Inversion, which leverages multiple input images to enhance avatar reconstruction performance, thereby improving the accuracy of reconstruction details and enabling the generation of high-fidelity 3D facial avatars in a short time. Additionally, the method introduces an innovative neural texture encoder and a recurrent network to accumulate temporal data from multiple frames, further improving the reconstruction quality of geometric and texture details.

InvertAvatar: Incremental GAN Inversion for Generalized Head Avatars

Generalizable and Animatable Gaussian Head Avatar

GANHead: Towards Generative Animatable Neural Head Avatars

High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization

Two Birds with One Stone: Transforming and Generating Facial Images with Iterative GAN

GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar

3D GAN Inversion with Facial Symmetry Prior

AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction

Meta-Auxiliary Network for 3D GAN Inversion

Two Birds with One Stone: Iteratively Learn Facial Attributes with GANs.

Make Encoder Great Again in 3D GAN Inversion through Geometry and Occlusion-Aware Encoding

Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion

Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars

In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing

Out-of-domain GAN Inversion Via Invertibility Decomposition for Photo-Realistic Human Face Manipulation

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

3D GAN Inversion with Pose Optimization

Monocular 3D Object Reconstruction with GAN Inversion

StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video

Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images

EGAIN: Extended GAn INversion