Abstract:While recent works on blind face image restoration have successfully produced impressive high-quality (HQ) images with abundant details from low-quality (LQ) input images, the generated content may not accurately reflect the real appearance of a person. To address this problem, incorporating well-shot personal images as additional reference inputs could be a promising strategy. Inspired by the recent success of the Latent Diffusion Model (LDM), we propose ReF-LDM, an adaptation of LDM designed to generate HQ face images conditioned on one LQ image and multiple HQ reference images. Our model integrates an effective and efficient mechanism, CacheKV, to leverage the reference images during the generation process. Additionally, we design a timestep-scaled identity loss, enabling our LDM-based model to focus on learning the discriminating features of human faces. Lastly, we construct FFHQ-Ref, a dataset consisting of 20,405 high-quality (HQ) face images with corresponding reference images, which can serve as both training and evaluation data for reference-based face restoration models.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in the process of restoring high - quality (HQ) facial images from low - quality (LQ) facial images, the content generated by existing methods may not accurately reflect the real appearance of the person. Specifically, when the input low - quality image contains damage to important features, the reconstructed image may look like a different person. To solve this problem, the authors propose a reference - image - based method, that is, using well - taken personal images as an additional reference input to help restore more realistic facial details. For this purpose, they propose ReF - LDM (Reference - based Face Latent Diffusion Model), which is an improved Latent Diffusion Model (LDM) aiming to generate high - quality facial images by combining a low - quality image and multiple high - quality reference images. ### Main problem summary: 1. **Limitations of existing methods**: - Although existing blind face image restoration methods can generate high - resolution images, these images may be inconsistent with the real appearance of the original person. - When the input low - quality image is severely degraded, the generated image may lose the identity characteristics of the person. 2. **Necessity of introducing reference images**: - Using high - quality reference images can help the model better capture and restore the real appearance characteristics of the person. - Multiple reference images can provide more comprehensive information about the person's appearance, such as different postures, expressions or lighting conditions. 3. **Technical challenges**: - How to effectively integrate the information of multiple reference images into the generation process, especially when there is spatial misalignment between the reference images and the target image. - How to ensure that the generated image is not only of high quality, but also maintains the consistency of the person's identity in the low - quality input image and the reference images. To solve these problems, the authors propose the following innovations: - **CacheKV mechanism**: used to efficiently integrate the features of multiple reference images. - **Timestep - scaled identity loss**: makes the model pay more attention to learning the distinguishing features of human faces during the generation process. - **FFHQ - Ref dataset**: a dataset containing 20,405 high - quality facial images and their corresponding reference images is constructed for training and evaluating reference - image - driven face restoration models. Through these improvements, ReF - LDM can significantly improve facial identity similarity while maintaining high - quality images, thus better restoring the real appearance of the person.

ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration

Self-Reference Image Super-Resolution via Pre-trained Diffusion Large Model and Window Adjustable Transformer

ReFIR: Grounding Large Restoration Models with Retrieval Augmentation

Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model

RestorerID: Towards Tuning-Free Face Restoration with ID Preservation

DifFace: Blind Face Restoration with Diffused Error Contraction

Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond

AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion Prior

OSDFace: One-Step Diffusion Model for Face Restoration

Refusion: Enabling Large-Size Realistic Image Restoration with Latent-Space Diffusion Models

Blind Face Restoration via Deep Multi-scale Component Dictionaries

CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models

Towards Real-World Blind Face Restoration with Generative Diffusion Prior

Survey on Deep Face Restoration: From Non-blind to Blind and Beyond

PDGrad: Guiding Diffusion Model for Reference-Based Blind Face Restoration with Pivot Direction Gradient Guidance

DiffBFR: Bootstrapping Diffusion Model Towards Blind Face Restoration

Toward Real-World Blind Face Restoration With Generative Diffusion Prior

Learning Dual Memory Dictionaries for Blind Face Restoration

PFStorer: Personalized Face Restoration and Super-Resolution

Towards Unsupervised Blind Face Restoration using Diffusion Prior