FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

Jiawei Zhang,Zijian Wu,Zhiyang Liang,Yicheng Gong,Dongfang Hu,Yao Yao,Xun Cao,Hao Zhu
2024-11-24
Abstract:Reconstructing high-fidelity, animatable 3D head avatars from effortlessly captured monocular videos is a pivotal yet formidable challenge. Although significant progress has been made in rendering performance and manipulation capabilities, notable challenges remain, including incomplete reconstruction and inefficient Gaussian representation. To address these challenges, we introduce FATE, a novel method for reconstructing an editable full-head avatar from a single monocular video. FATE integrates a sampling-based densification strategy to ensure optimal positional distribution of points, improving rendering efficiency. A neural baking technique is introduced to convert discrete Gaussian representations into continuous attribute maps, facilitating intuitive appearance editing. Furthermore, we propose a universal completion framework to recover non-frontal appearance, culminating in a 360$^\circ$-renderable 3D head avatar. FATE outperforms previous approaches in both qualitative and quantitative evaluations, achieving state-of-the-art performance. To the best of our knowledge, FATE is the first animatable and 360$^\circ$ full-head monocular reconstruction method for a 3D head avatar. The code will be publicly released upon publication.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve two main problems encountered when reconstructing high - quality, animatable 3D head models from monocular videos: 1. **Incomplete head modeling**: - Most previous methods mainly focused on modeling the front - facing human face and were unable to recover the back - of - the - head part. This is due to the reliance on parametric face estimation methods, and in the absence of facial features (whether landmark - based or landmark - free methods), the modeling of the back - of - the - head part fails. - In fact, most portrait videos mainly focus on information - rich front - facing images, while the information in the rear view is less and rarely captured. Therefore, recovering a complete 360° 3D head from an unknown perspective (such as the side and the back) remains an unsolved challenge. 2. **Inefficiency and discreteness of 3D Gaussian representation**: - Researchers have found that the densification mechanism in the original 3D Gaussian model (3DGS) is not suitable for monocular reconstruction tasks because it generates a large number of redundant attribute points during the training stage, and these redundant points reduce the rendering quality and increase the model complexity. - In addition, due to the discrete nature of the 3D Gaussian representation, the head represented by 3DGS cannot be directly edited in the UV texture space like a polygonal mesh model. Some previous editable methods rely on pre - trained diffusion models (such as InstructPix2Pix), which are both time - consuming and difficult to control. To solve these problems, the authors propose the FATE (Full - head Gaussian Avatar with Textural Editing) method, with specific improvements including: - **Sampling densification strategy**: By proposing a sampling - based densification method, a better position distribution than previous methods is achieved, improving the rendering efficiency. - **Neural baking technique**: Convert the discrete Gaussian representation into a continuous attribute map, so that the appearance can be intuitively edited in the UV space. - **General completion framework**: Utilize the pre - trained generative model SphereHead to extract prior knowledge for appearance customization to recover the appearance of non - front - facing perspectives, and finally achieve a 360° renderable 3D head model. FATE outperforms existing methods in both qualitative and quantitative evaluations, and is, to the authors' knowledge, the first 3D head model method that is animatable and supports 360° omnidirectional monocular reconstruction.