3D Priors-Guided Diffusion for Blind Face Restoration

Xiaobin Lu,Xiaobin Hu,Jun Luo,Ben Zhu,Yaping Ruan,Wenqi Ren
2024-09-12
Abstract:Blind face restoration endeavors to restore a clear face image from a degraded counterpart. Recent approaches employing Generative Adversarial Networks (GANs) as priors have demonstrated remarkable success in this field. However, these methods encounter challenges in achieving a balance between realism and fidelity, particularly in complex degradation scenarios. To inherit the exceptional realism generative ability of the diffusion model and also constrained by the identity-aware fidelity, we propose a novel diffusion-based framework by embedding the 3D facial priors as structure and identity constraints into a denoising diffusion process. Specifically, in order to obtain more accurate 3D prior representations, the 3D facial image is reconstructed by a 3D Morphable Model (3DMM) using an initial restored face image that has been processed by a pretrained restoration network. A customized multi-level feature extraction method is employed to exploit both structural and identity information of 3D facial images, which are then mapped into the noise estimation process. In order to enhance the fusion of identity information into the noise estimation, we propose a Time-Aware Fusion Block (TAFB). This module offers a more efficient and adaptive fusion of weights for denoising, considering the dynamic nature of the denoising process in the diffusion model, which involves initial structure refinement followed by texture detail enhancement. Extensive experiments demonstrate that our network performs favorably against state-of-the-art algorithms on synthetic and real-world datasets for blind face restoration. The Code is released on our project page at <a class="link-external link-https" href="https://github.com/838143396/3Diffusion" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to restore high - quality face images that are both realistic and fidelity - preserving in complex degradation scenarios in the blind face inpainting task. Specifically, existing methods have difficulty achieving a balance between realism and fidelity when dealing with complex degradation situations, especially when it comes to maintaining facial identity consistency. To solve this problem, the paper proposes a new framework based on the diffusion model, which embeds 3D facial prior information as structural and identity constraints to improve the realism and fidelity of the inpainting results. ### Main Contributions 1. **Proposed a new face inpainting network based on the diffusion model**: This network incorporates 3D facial structure information into the noise estimation process, uses a multi - level feature extraction method to extract structure and identity information, and projects it into the latent noise space. 2. **Designed the Time - Aware Fusion Block (TAFB)**: This block can effectively and adaptively fuse facial prior features and noisy image features, adapting to the denoising process of the diffusion model from structural refinement to texture detail enhancement. 3. **Experimental verification**: Extensive experiments show that the proposed method outperforms existing methods on both synthetic and real - world datasets, and performs excellently in terms of image quality restoration and identity consistency. ### Method Overview 1. **3D Reconstruction Branch**: - Use the pre - trained SwinIR model to perform preliminary inpainting on the low - quality image to obtain the initial inpainting result \( x_{\text{init}} \). - Utilize the 3D Morphable Model (3DMM) to reconstruct the 3D facial image from \( x_{\text{init}} \) and extract 3D facial prior information, including identity, expression, texture, and illumination, etc. 2. **Denoising Diffusion Branch**: - Generate a noisy image by passing the high - resolution image \( x_{\text{hq}} \) through the forward process. - Extract multi - level features of the 3D facial image through the multi - level feature extraction module to capture structure and identity information. - Use the Time - Aware Fusion Block (TAFB) to fuse the 3D facial prior features with the noisy image features and gradually complete the denoising process. ### Experimental Results - **Quantitative evaluation**: The experimental results on the CelebA - Test dataset show that the proposed method outperforms other methods in terms of PSNR, SSIM, LPIPS, FID and other metrics. - **Qualitative analysis**: The qualitative results on the real - world datasets LFW - Test, WebPhoto, and WIDER - Test show that the proposed method performs better when dealing with severely degraded images, especially in maintaining facial identity consistency and structural details. ### Conclusion By introducing 3D facial prior information, the paper effectively solves the balance problem between realism and fidelity in the blind face inpainting task, especially in complex degradation scenarios. The experimental results verify the effectiveness and superiority of the proposed method.