Abstract:4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant on time-consuming manual processing by artists. To simplify this process, we propose Topo4D, a novel framework for automatic geometry and texture generation, which optimizes densely aligned 4D heads and 8K texture maps directly from calibrated multi-view time-series images. Specifically, we first represent the time-series faces as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we perform alternative geometry and texture optimization frame-by-frame for high-quality geometry and texture learning while maintaining temporal topology stability. Finally, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods both in the quality of meshes and textures. Project page: <a class="link-external link-https" href="https://xuanchenli.github.io/Topo4D/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key challenges in 4D face reconstruction, specifically including: 1. **High - quality 4D face asset reconstruction**: Existing methods have difficulties in generating dynamic topological meshes and corresponding texture maps from videos. The goal of 4D head capture is to generate high - fidelity 4D face models that can simulate facial muscle movements and restore dynamic textures such as pore squeezing. 2. **Maintaining temporal consistency**: Traditional methods such as multi - view stereo (MVS) and non - rigid alignment are prone to errors and require time - consuming manual processing to ensure temporal consistency. These methods are difficult to guarantee the temporal stability between different frames, resulting in problems such as texture drift. 3. **Automated process**: The industry usually uses professional equipment (such as Light Stage) to capture high - quality multi - view videos, then calculates the facial scans of each frame through MVS, and then performs non - rigid registration to superimpose topologically aligned faces onto the scans. In order to obtain usable assets, this process requires marking on the subject's face and manual post - processing by artists. This is not only time - consuming but also depends on a large amount of human intervention. Therefore, there is an urgent need to develop a more automated process to accelerate 4D asset reconstruction. To solve these problems, the authors propose a new framework - **Topo4D**, which can directly optimize densely aligned 4D heads and 8K texture maps from calibrated multi - view time - series images. Specifically, the main contributions of Topo4D are as follows: - Propose a new optimization framework for reconstructing high - quality 4D heads and photo - realistic textures with pore - level details from multi - view videos. - Introduce Gaussian Mesh and UV densification techniques to better represent facial models under predefined topological structures and fixed UVs. - Design an alternating geometry and texture optimization process to ensure temporal and topological consistency and regular mesh arrangement during the optimization process. Through these innovations, Topo4D can extract meshes and textures with fixed topological structures while maintaining high - quality rendering, and has shown better performance than the existing state - of - the - art methods in experiments.

Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

Topologically Consistent Multi-View Face Inference Using Volumetric Sampling

Topology-aware Human Avatars with Semantically-guided Gaussian Splatting

Dynamics-Aware Gaussian Splatting Streaming Towards Fast On-the-Fly Training for 4D Reconstruction

Light-Weight Multi-view Topology Consistent Facial Geometry and Reflectance Capture.

UnstructuredFusion: Realtime 4D Geometry and Texture Reconstruction Using Commercial RGBD Cameras.

S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points

Dynamic 4D facial capture pipeline with appearance driven progressive retopology based on optical flow

Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation

SplatFace: Gaussian Splat Face Reconstruction Leveraging an Optimizable Surface

TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction

Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors

Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes

Full Head Performance Capture Using Multi-scale Mesh Propagation

Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video

Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data

DynaSurfGS: Dynamic Surface Reconstruction with Planar-based Gaussian Splatting

Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping