Abstract:We present a new method for reconstructing the appearance properties of human faces from a lightweight capture procedure in an unconstrained environment. Our method recovers the surface geometry, diffuse albedo, specular intensity and specular roughness from a monocular video containing a simple head rotation in-the-wild. Notably, we make no simplifying assumptions on the environment lighting, and we explicitly take visibility and occlusions into account. As a result, our method can produce facial appearance maps that approach the fidelity of studio-based multi-view captures, but with a far easier and cheaper procedure.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenge of capturing human facial appearance attributes using only monocular video in an uncontrolled natural environment. Specifically, the paper aims to recover the facial surface geometry, diffuse albedo, specular intensity, and specular roughness from simple head - rotation videos, without relying on any simplified assumptions about the ambient lighting. ### Core Problems of the Paper 1. **Capturing High - Quality Facial Appearance in a Natural Environment**: - Existing methods usually require multi - view capture in a controlled studio environment to obtain high - fidelity facial appearance data. Although these methods are effective, the equipment is expensive and the operation is complex. - This paper proposes a lightweight method that can capture facial appearance attributes using only simple head - rotation videos taken by a single camera in any environment (including indoor and outdoor), thereby greatly reducing the equipment and operation costs. 2. **Handling Complex Lighting Conditions**: - The method in the paper does not need to make any assumptions about the ambient lighting and can handle various lighting conditions, including direct sunlight and shadowy environments. - By combining ray - tracing and pre - filtered environment maps with visibility modulation, this method can more accurately simulate lighting effects, especially in dealing with self - occlusion and shadow problems. 3. **Improving Reconstruction Quality**: - The paper proposes a new shading model that can better separate the diffuse and specular reflection components, avoiding the problem of mixing specular reflection signals into the diffuse map, which is common in existing methods. - This makes the reconstructed facial geometry and appearance attributes more realistic, approaching the multi - view capture effect at the studio level. ### Formula Summary - **Rendering Equation**: \[ L(\omega_o)=\int_{\Omega} f(x, \omega_i, \omega_o)L_i(\omega_i)(\omega_i\cdot n)d\omega_i \] where \(L(\omega_o)\) is the outgoing radiance in the direction \(\omega_o\), \(f(x, \omega_i, \omega_o)\) is the BRDF function, \(L_i(\omega_i)\) is the incident light intensity, and \(n\) is the normal vector. - **Diffuse Term**: \[ f_d(x)=\frac{\rho(x)}{\pi} \] where \(\rho(x)\) is the diffuse albedo. - **Specular Term**: \[ f_s(x, \omega_i, \omega_o)=\frac{DGF}{4(\omega_i\cdot n)(\omega_o\cdot n)} \] where \(D\), \(G\) and \(F\) are the Beckmann distribution, geometric attenuation and Fresnel term respectively. - **Diffuse Term Considering Lighting Visibility**: \[ L_{\text{diffuse}}(\omega_o)=\int_{\Omega} f_d(x)L_i(\omega_i)V(x, \omega_i)(\omega_i\cdot n)d\omega_i \] where \(V(x, \omega_i)\) is the visibility from the surface point \(x\) to the incident light direction \(\omega_i\). - **Specular Term Considering Lighting Visibility**: \[ L_{\text{specular}}(\omega_o)\approx\left(\int_{\Omega} f_s(x, \omega_i, \omega_o)(\omega_i\cdot n)d\omega_i\right)\left(\int_{\Omega} L_i(\omega_i)D\right)

Monocular Facial Appearance Capture in the Wild

Single-shot high-quality facial geometry and skin appearance capture

High-Quality Facial Geometry and Appearance Capture at Home

SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video

Monocular Real-time Full Body Capture with Inter-part Correlations

Monocular Identity-Conditioned Facial Reflectance Reconstruction

Robust Geometry and Reflectance Disentanglement for 3D Face Reconstruction from Sparse-view Images

Light-Weight Multi-view Topology Consistent Facial Geometry and Reflectance Capture.

SPARK: Self-supervised Personalized Real-time Monocular Face Capture

Towards Fully Mobile 3D Face, Body, and Environment Capture Using Only Head-worn Cameras

Realistic Reconstruction of Human Face Based on Images

Cafca: High-quality Novel View Synthesis of Expressive Faces from Casual Few-shot Captures

MonoNPHM: Dynamic Head Reconstruction from Monocular Videos

Extreme 3D Face Reconstruction: Seeing Through Occlusions

Ear-to-ear Capture of Facial Intrinsics

Capturing Relightable Human Performances under General Uncontrolled Illumination.

LiveCap: Real-time Human Performance Capture from Monocular Video

Real-time high-fidelity facial performance capture

Video-Based Outdoor Human Reconstruction.

Real-time Full Body Capture with Inter-part Correlations – Supplemental Document –

TED-Face: Texture-Enhanced Deep Face Reconstruction in the Wild