Abstract:Reconstructing 3D face models from a single image is an inherently ill-posed problem, which becomes even more challenging in the presence of occlusions. In addition to fewer available observations, occlusions introduce an extra source of ambiguity, where multiple reconstructions can be equally valid. Despite the ubiquity of the problem, very few methods address its multi-hypothesis nature. In this paper we introduce OFER, a novel approach for single image 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces, even under strong occlusions. Specifically, we train two diffusion models to generate the shape and expression coefficients of a face parametric model, conditioned on the input image. This approach captures the multi-modal nature of the problem, generating a distribution of solutions as output. Although this addresses the ambiguity problem, the challenge remains to pick the best matching shape to ensure consistency across diverse expressions. To achieve this, we propose a novel ranking mechanism that sorts the outputs of the shape diffusion network based on the predicted shape accuracy scores to select the best match. We evaluate our method using standard benchmarks and introduce CO-545, a new protocol and dataset designed to assess the accuracy of expressive faces under occlusion. Our results show improved performance over occlusion-based methods, with added ability to generate multiple expressions for a given image.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to reconstruct a 3D face model from a single image in the presence of occlusion. Specifically, occlusion makes some areas of the face invisible, thus introducing additional uncertainty and potentially resulting in multiple equally valid reconstruction results. Therefore, this problem is essentially a multi - hypothesis reconstruction problem. Although occlusion is very common in real - world scenarios, few methods can handle its multi - hypothesis nature. ### Specific challenges of the problem: 1. **Uncertainty brought by occlusion**: The occluded area can correspond to an infinite number of valid shapes, making the reconstruction task more complex. 2. **Multi - hypothesis reconstruction**: It is necessary to generate multiple possible reconstruction results to deal with the uncertainty brought by occlusion. 3. **Consistency problem**: While generating multiple expressions, ensure the consistency of the underlying geometric structure. ### Solutions proposed in the paper: To solve the above problems, the paper proposes a new method named OFER (Occluded Face Expression Reconstruction). The main contributions of this method include: 1. **Using diffusion models to generate diverse 3D faces**: By training two diffusion models (DDPMs) to generate the shape and expression coefficients of the FLAME - parameterized face model respectively, the multimodal characteristics of the data are captured and multiple possible reconstruction results are generated. 2. **Novel ranking mechanism**: A new ranking mechanism is proposed to evaluate and select the optimal solution among the samples generated by the shape diffusion network to ensure the consistency and accuracy of the reconstruction. 3. **New dataset and evaluation protocol**: A new dataset CO - 545 and its evaluation protocol are introduced, which are specifically used to evaluate the face reconstruction performance under occlusion conditions. ### Method overview: - **Identity Generative Network (IdGen)**: Generate a set of FLAME shape coefficients to capture the diversity of the occluded area. - **Identity Ranking Network (IdRank)**: Score and rank the generated shape samples and select the shape that best fits the input image. - **Expression Generative Network (ExpGen)**: Generate diverse expression coefficients, which are combined with the selected shape to generate the final 3D face reconstruction result. Through these innovations, OFER can generate multiple reasonable and diverse 3D face reconstruction results in the presence of severe occlusion and shows better performance than existing methods in multiple benchmark tests.

OFER: Occluded Face Expression Reconstruction

FReeNet: Multi-Identity Face Reenactment

Generating Diverse 3D Reconstructions from a Single Occluded Face Image

Distilling knowledge for occlusion robust monocular 3D face reconstruction

Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition

Automatic facial expression recognition on a single 3D face by exploring shape deformation.

Occluded Facial Expression Recognition with Step-Wise Assistance from Unpaired Non-Occluded Images.

Extreme 3D Face Reconstruction: Seeing Through Occlusions

A Deformation Model to Reduce the Effect of Expressions in 3D Face Recognition

3D Facial Expression Reconstruction using Cascaded Regression

Let's Get the FACS Straight -- Reconstructing Obstructed Facial Features

Occluded Facial Expression Recognition Enhanced Through Privileged Information.

EMOCA: Emotion Driven Monocular Face Capture and Animation

On Recognizing Occluded Faces in the Wild

Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation

Dynamic 3D Facial Expression Reconstruction from Images

3D Facial Expressions through Analysis-by-Neural-Synthesis

Disjoint Pose and Shape for 3D Face Reconstruction

FaceScape: 3D Facial Dataset and Benchmark for Single-View 3D Face Reconstruction

Jointly Optimizing Expressional and Residual Models for 3D Facial Expression Removal.

Realistic 3D Face Modeling by Fusing Multiple 2D Images