Monocular Facial Appearance Capture in the Wild

Yingyan Xu,Kate Gadola,Prashanth Chandran,Sebastian Weiss,Markus Gross,Gaspard Zoss,Derek Bradley
2024-12-17
Abstract:We present a new method for reconstructing the appearance properties of human faces from a lightweight capture procedure in an unconstrained environment. Our method recovers the surface geometry, diffuse albedo, specular intensity and specular roughness from a monocular video containing a simple head rotation in-the-wild. Notably, we make no simplifying assumptions on the environment lighting, and we explicitly take visibility and occlusions into account. As a result, our method can produce facial appearance maps that approach the fidelity of studio-based multi-view captures, but with a far easier and cheaper procedure.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of capturing human facial appearance attributes using only monocular video in an uncontrolled natural environment. Specifically, the paper aims to recover the facial surface geometry, diffuse albedo, specular intensity, and specular roughness from simple head - rotation videos, without relying on any simplified assumptions about the ambient lighting. ### Core Problems of the Paper 1. **Capturing High - Quality Facial Appearance in a Natural Environment**: - Existing methods usually require multi - view capture in a controlled studio environment to obtain high - fidelity facial appearance data. Although these methods are effective, the equipment is expensive and the operation is complex. - This paper proposes a lightweight method that can capture facial appearance attributes using only simple head - rotation videos taken by a single camera in any environment (including indoor and outdoor), thereby greatly reducing the equipment and operation costs. 2. **Handling Complex Lighting Conditions**: - The method in the paper does not need to make any assumptions about the ambient lighting and can handle various lighting conditions, including direct sunlight and shadowy environments. - By combining ray - tracing and pre - filtered environment maps with visibility modulation, this method can more accurately simulate lighting effects, especially in dealing with self - occlusion and shadow problems. 3. **Improving Reconstruction Quality**: - The paper proposes a new shading model that can better separate the diffuse and specular reflection components, avoiding the problem of mixing specular reflection signals into the diffuse map, which is common in existing methods. - This makes the reconstructed facial geometry and appearance attributes more realistic, approaching the multi - view capture effect at the studio level. ### Formula Summary - **Rendering Equation**: \[ L(\omega_o)=\int_{\Omega} f(x, \omega_i, \omega_o)L_i(\omega_i)(\omega_i\cdot n)d\omega_i \] where \(L(\omega_o)\) is the outgoing radiance in the direction \(\omega_o\), \(f(x, \omega_i, \omega_o)\) is the BRDF function, \(L_i(\omega_i)\) is the incident light intensity, and \(n\) is the normal vector. - **Diffuse Term**: \[ f_d(x)=\frac{\rho(x)}{\pi} \] where \(\rho(x)\) is the diffuse albedo. - **Specular Term**: \[ f_s(x, \omega_i, \omega_o)=\frac{DGF}{4(\omega_i\cdot n)(\omega_o\cdot n)} \] where \(D\), \(G\) and \(F\) are the Beckmann distribution, geometric attenuation and Fresnel term respectively. - **Diffuse Term Considering Lighting Visibility**: \[ L_{\text{diffuse}}(\omega_o)=\int_{\Omega} f_d(x)L_i(\omega_i)V(x, \omega_i)(\omega_i\cdot n)d\omega_i \] where \(V(x, \omega_i)\) is the visibility from the surface point \(x\) to the incident light direction \(\omega_i\). - **Specular Term Considering Lighting Visibility**: \[ L_{\text{specular}}(\omega_o)\approx\left(\int_{\Omega} f_s(x, \omega_i, \omega_o)(\omega_i\cdot n)d\omega_i\right)\left(\int_{\Omega} L_i(\omega_i)D\right)