InvertAvatar: Incremental GAN Inversion for Generalized Head Avatars

Xiaochen Zhao,Jingxiang Sun,Lizhen Wang,Jinli Suo,Yebin Liu
2024-05-27
Abstract:While high fidelity and efficiency are central to the creation of digital head avatars, recent methods relying on 2D or 3D generative models often experience limitations such as shape distortion, expression inaccuracy, and identity flickering. Additionally, existing one-shot inversion techniques fail to fully leverage multiple input images for detailed feature extraction. We propose a novel framework, \textbf{Incremental 3D GAN Inversion}, that enhances avatar reconstruction performance using an algorithm designed to increase the fidelity from multiple frames, resulting in improved reconstruction quality proportional to frame count. Our method introduces a unique animatable 3D GAN prior with two crucial modifications for enhanced expression controllability alongside an innovative neural texture encoder that categorizes texture feature spaces based on UV parameterization. Differentiating from traditional techniques, our architecture emphasizes pixel-aligned image-to-image translation, mitigating the need to learn correspondences between observation and canonical spaces. Furthermore, we incorporate ConvGRU-based recurrent networks for temporal data aggregation from multiple frames, boosting geometry and texture detail reconstruction. The proposed paradigm demonstrates state-of-the-art performance on one-shot and few-shot avatar animation tasks. Code will be available at <a class="link-external link-https" href="https://github.com/XChenZ/invertAvatar" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address several key issues in digital head avatar creation, particularly in terms of high fidelity and efficiency. Specifically: 1. **Shape Distortion**: Existing 2D generation models produce shape distortions when handling large movements due to the lack of geometric constraints. 2. **Inaccurate Expressions and Identity Flickering**: Methods based on 3D GANs often result in inaccurate expressions and identity flickering during animation because motion and appearance are naturally entangled in the latent space. 3. **Single Image Limitation**: Current one-shot inversion techniques rely on a single source image, which is insufficient to fully represent the subject, as a single image may contain occlusions and limited pose information. To address these issues, the authors propose a new framework—Incremental 3D GAN Inversion, which leverages multiple input images to enhance avatar reconstruction performance, thereby improving the accuracy of reconstruction details and enabling the generation of high-fidelity 3D facial avatars in a short time. Additionally, the method introduces an innovative neural texture encoder and a recurrent network to accumulate temporal data from multiple frames, further improving the reconstruction quality of geometric and texture details.