OFER: Occluded Face Expression Reconstruction

Pratheba Selvaraju,Victoria Fernandez Abrevaya,Timo Bolkart,Rick Akkerman,Tianyu Ding,Faezeh Amjadi,Ilya Zharkov
2024-10-29
Abstract:Reconstructing 3D face models from a single image is an inherently ill-posed problem, which becomes even more challenging in the presence of occlusions. In addition to fewer available observations, occlusions introduce an extra source of ambiguity, where multiple reconstructions can be equally valid. Despite the ubiquity of the problem, very few methods address its multi-hypothesis nature. In this paper we introduce OFER, a novel approach for single image 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces, even under strong occlusions. Specifically, we train two diffusion models to generate the shape and expression coefficients of a face parametric model, conditioned on the input image. This approach captures the multi-modal nature of the problem, generating a distribution of solutions as output. Although this addresses the ambiguity problem, the challenge remains to pick the best matching shape to ensure consistency across diverse expressions. To achieve this, we propose a novel ranking mechanism that sorts the outputs of the shape diffusion network based on the predicted shape accuracy scores to select the best match. We evaluate our method using standard benchmarks and introduce CO-545, a new protocol and dataset designed to assess the accuracy of expressive faces under occlusion. Our results show improved performance over occlusion-based methods, with added ability to generate multiple expressions for a given image.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to reconstruct a 3D face model from a single image in the presence of occlusion. Specifically, occlusion makes some areas of the face invisible, thus introducing additional uncertainty and potentially resulting in multiple equally valid reconstruction results. Therefore, this problem is essentially a multi - hypothesis reconstruction problem. Although occlusion is very common in real - world scenarios, few methods can handle its multi - hypothesis nature. ### Specific challenges of the problem: 1. **Uncertainty brought by occlusion**: The occluded area can correspond to an infinite number of valid shapes, making the reconstruction task more complex. 2. **Multi - hypothesis reconstruction**: It is necessary to generate multiple possible reconstruction results to deal with the uncertainty brought by occlusion. 3. **Consistency problem**: While generating multiple expressions, ensure the consistency of the underlying geometric structure. ### Solutions proposed in the paper: To solve the above problems, the paper proposes a new method named OFER (Occluded Face Expression Reconstruction). The main contributions of this method include: 1. **Using diffusion models to generate diverse 3D faces**: By training two diffusion models (DDPMs) to generate the shape and expression coefficients of the FLAME - parameterized face model respectively, the multimodal characteristics of the data are captured and multiple possible reconstruction results are generated. 2. **Novel ranking mechanism**: A new ranking mechanism is proposed to evaluate and select the optimal solution among the samples generated by the shape diffusion network to ensure the consistency and accuracy of the reconstruction. 3. **New dataset and evaluation protocol**: A new dataset CO - 545 and its evaluation protocol are introduced, which are specifically used to evaluate the face reconstruction performance under occlusion conditions. ### Method overview: - **Identity Generative Network (IdGen)**: Generate a set of FLAME shape coefficients to capture the diversity of the occluded area. - **Identity Ranking Network (IdRank)**: Score and rank the generated shape samples and select the shape that best fits the input image. - **Expression Generative Network (ExpGen)**: Generate diverse expression coefficients, which are combined with the selected shape to generate the final 3D face reconstruction result. Through these innovations, OFER can generate multiple reasonable and diverse 3D face reconstruction results in the presence of severe occlusion and shows better performance than existing methods in multiple benchmark tests.