VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction

Haoyu Wu,Ziqiao Peng,Xukun Zhou,Yunfei Cheng,Jun He,Hongyan Liu,Zhaoxin Fan
2024-09-17
Abstract:3D face reconstruction from monocular images has promoted the development of various applications such as augmented reality. Though existing methods have made remarkable progress, most of them emphasize geometric reconstruction, while overlooking the importance of texture prediction. To address this issue, we propose VGG-Tex, a novel Vivid Geometry-Guided Facial Texture Estimation model designed for High Fidelity Monocular 3D Face Reconstruction. The core of this approach is leveraging 3D parametric priors to enhance the outcomes of 2D UV texture estimation. Specifically, VGG-Tex includes a Facial Attributes Encoding Module, a Geometry-Guided Texture Generator, and a Visibility-Enhanced Texture Completion Module. These components are responsible for extracting parametric priors, generating initial textures, and refining texture details, respectively. Based on the geometry-texture complementarity principle, VGG-Tex also introduces a Texture-guided Geometry Refinement Module to further balance the overall fidelity of the reconstructed 3D faces, along with corresponding losses. Comprehensive experiments demonstrate that our method significantly improves texture reconstruction performance compared to existing state-of-the-art methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to achieve high - quality geometric structure and texture estimation simultaneously in 3D face reconstruction from monocular images. Although existing methods have made significant progress in geometric reconstruction, most of them overemphasize the recovery of geometric structures and neglect the importance of texture prediction. The author points out that even if the geometric details are not very fine, better texture can greatly improve the visual experience. Therefore, the paper proposes a new model, VGG - Tex, aiming to achieve high - fidelity monocular 3D face reconstruction by using detailed geometric information to guide texture estimation. ### Main Contributions 1. **Introduction of VGG - Tex**: A method for high - fidelity monocular 3D face reconstruction, adopting the concept of geometric - guided texture estimation. 2. **Development of Three Innovative Modules**: - **Facial Attribute Encoding Module**: Predict the parameters of the FLAME model from the input image to reconstruct the geometric structure of the 3D head and extract the latent geometric embedding. - **Geometry - Guided Texture Generator**: Utilize the Visual Transformer encoder and texture decoder, combined with the latent geometric embedding, to generate the UV texture map. - **Visibility - Enhanced Texture Completion Module**: By adding random masks to the input image to simulate occluded parts, improve the performance of the model in dealing with real - world scenarios. 3. **Introduction of Texture - Guided Geometric Refinement Training Strategy**: Based on the geometric - texture complementary principle, further improve the overall fidelity of the reconstructed 3D face, and design the corresponding loss function. ### Method Overview VGG - Tex adopts a dual - branch network architecture: - **Facial Attribute Encoding Module**: Process the original face image, predict 3D FLAME parameters to reconstruct the geometric structure, and extract the latent geometric embedding. - **Geometry - Guided Texture Generator**: Utilize the segmented image patches, extract texture embeddings through the Visual Transformer encoder, combine with the latent geometric embedding, and generate the final UV texture map. - **Visibility - Enhanced Texture Completion Module**: During the training process, by adding random masks to simulate occluded parts, improve the robustness of the model. ### Loss Functions - **Landmark Projection Loss**: Optimize shape, expression, and pose parameters, measure the differences between the 2D input image and the 3D model. - **Rendered Texture Loss**: Calculate the error between the input image and the rendered image, measure the differences between the real texture and the predicted texture. - **Visibility - Aware Texture Loss**: Consider texture losses from different viewing angles. - **Identity Loss**: Constrain the identity features of the predicted texture, use the features of the ArcFace model for comparison. - **Visibility Loss**: Calculate the error between the projection mask and the facial skin mask during the rendering process, prevent the texture generator from learning from pixels outside the facial area. ### Experimental Results - **Quantitative Comparison**: In the texture estimation performance comparison on the Now benchmark, VGG - Tex outperforms existing methods in multiple metrics such as SSIM, FID, LPIPS, and ID. - **Geometric Reconstruction Performance**: In the geometric reconstruction performance comparison on the NoW benchmark, VGG - Tex achieves performance comparable to existing strong baselines. - **Ablation Study**: By removing different modules, verify the contribution of each module to the overall performance. In conclusion, through proposing the VGG - Tex model, this paper successfully solves the problem of high - quality texture estimation in monocular 3D face reconstruction, providing new ideas and technical solutions for research in this field.