ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration

Chi-Wei Hsiao,Yu-Lun Liu,Cheng-Kun Yang,Sheng-Po Kuo,Kevin Jou,Chia-Ping Chen
2024-12-06
Abstract:While recent works on blind face image restoration have successfully produced impressive high-quality (HQ) images with abundant details from low-quality (LQ) input images, the generated content may not accurately reflect the real appearance of a person. To address this problem, incorporating well-shot personal images as additional reference inputs could be a promising strategy. Inspired by the recent success of the Latent Diffusion Model (LDM), we propose ReF-LDM, an adaptation of LDM designed to generate HQ face images conditioned on one LQ image and multiple HQ reference images. Our model integrates an effective and efficient mechanism, CacheKV, to leverage the reference images during the generation process. Additionally, we design a timestep-scaled identity loss, enabling our LDM-based model to focus on learning the discriminating features of human faces. Lastly, we construct FFHQ-Ref, a dataset consisting of 20,405 high-quality (HQ) face images with corresponding reference images, which can serve as both training and evaluation data for reference-based face restoration models.
Computer Vision and Pattern Recognition,Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the process of restoring high - quality (HQ) facial images from low - quality (LQ) facial images, the content generated by existing methods may not accurately reflect the real appearance of the person. Specifically, when the input low - quality image contains damage to important features, the reconstructed image may look like a different person. To solve this problem, the authors propose a reference - image - based method, that is, using well - taken personal images as an additional reference input to help restore more realistic facial details. For this purpose, they propose ReF - LDM (Reference - based Face Latent Diffusion Model), which is an improved Latent Diffusion Model (LDM) aiming to generate high - quality facial images by combining a low - quality image and multiple high - quality reference images. ### Main problem summary: 1. **Limitations of existing methods**: - Although existing blind face image restoration methods can generate high - resolution images, these images may be inconsistent with the real appearance of the original person. - When the input low - quality image is severely degraded, the generated image may lose the identity characteristics of the person. 2. **Necessity of introducing reference images**: - Using high - quality reference images can help the model better capture and restore the real appearance characteristics of the person. - Multiple reference images can provide more comprehensive information about the person's appearance, such as different postures, expressions or lighting conditions. 3. **Technical challenges**: - How to effectively integrate the information of multiple reference images into the generation process, especially when there is spatial misalignment between the reference images and the target image. - How to ensure that the generated image is not only of high quality, but also maintains the consistency of the person's identity in the low - quality input image and the reference images. To solve these problems, the authors propose the following innovations: - **CacheKV mechanism**: used to efficiently integrate the features of multiple reference images. - **Timestep - scaled identity loss**: makes the model pay more attention to learning the distinguishing features of human faces during the generation process. - **FFHQ - Ref dataset**: a dataset containing 20,405 high - quality facial images and their corresponding reference images is constructed for training and evaluating reference - image - driven face restoration models. Through these improvements, ReF - LDM can significantly improve facial identity similarity while maintaining high - quality images, thus better restoring the real appearance of the person.