Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture
Xuanchen Li,Yuhao Cheng,Xingyu Ren,Haozhe Jia,Di Xu,Wenhan Zhu,Yichao Yan
2024-07-15
Abstract:4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant on time-consuming manual processing by artists. To simplify this process, we propose Topo4D, a novel framework for automatic geometry and texture generation, which optimizes densely aligned 4D heads and 8K texture maps directly from calibrated multi-view time-series images. Specifically, we first represent the time-series faces as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we perform alternative geometry and texture optimization frame-by-frame for high-quality geometry and texture learning while maintaining temporal topology stability. Finally, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods both in the quality of meshes and textures. Project page: <a class="link-external link-https" href="https://xuanchenli.github.io/Topo4D/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve?
This paper aims to solve several key challenges in 4D face reconstruction, specifically including:
1. **High - quality 4D face asset reconstruction**: Existing methods have difficulties in generating dynamic topological meshes and corresponding texture maps from videos. The goal of 4D head capture is to generate high - fidelity 4D face models that can simulate facial muscle movements and restore dynamic textures such as pore squeezing.
2. **Maintaining temporal consistency**: Traditional methods such as multi - view stereo (MVS) and non - rigid alignment are prone to errors and require time - consuming manual processing to ensure temporal consistency. These methods are difficult to guarantee the temporal stability between different frames, resulting in problems such as texture drift.
3. **Automated process**: The industry usually uses professional equipment (such as Light Stage) to capture high - quality multi - view videos, then calculates the facial scans of each frame through MVS, and then performs non - rigid registration to superimpose topologically aligned faces onto the scans. In order to obtain usable assets, this process requires marking on the subject's face and manual post - processing by artists. This is not only time - consuming but also depends on a large amount of human intervention. Therefore, there is an urgent need to develop a more automated process to accelerate 4D asset reconstruction.
To solve these problems, the authors propose a new framework - **Topo4D**, which can directly optimize densely aligned 4D heads and 8K texture maps from calibrated multi - view time - series images. Specifically, the main contributions of Topo4D are as follows:
- Propose a new optimization framework for reconstructing high - quality 4D heads and photo - realistic textures with pore - level details from multi - view videos.
- Introduce Gaussian Mesh and UV densification techniques to better represent facial models under predefined topological structures and fixed UVs.
- Design an alternating geometry and texture optimization process to ensure temporal and topological consistency and regular mesh arrangement during the optimization process.
Through these innovations, Topo4D can extract meshes and textures with fixed topological structures while maintaining high - quality rendering, and has shown better performance than the existing state - of - the - art methods in experiments.