Abstract:Diffusion models (DMs) have achieved significant success in generating imaginative images given textual descriptions. However, they are likely to fall short when it comes to real-life scenarios with intricate details. The low-quality, unrealistic human faces in text-to-image generation are one of the most prominent issues, hindering the wide application of DMs in practice. Targeting addressing such an issue, we first assess the face quality of generations from popular pre-trained DMs with the aid of human annotators and then evaluate the alignment between existing metrics with human judgments. Observing that existing metrics can be unsatisfactory for quantifying face quality, we develop a novel metric named FaceScore (FS) by fine-tuning the widely used ImageReward on a dataset of (win, loss) face pairs cheaply crafted by an inpainting pipeline of DMs. Extensive studies reveal FS enjoys a superior alignment with humans. On the other hand, FS opens up the door for enhancing DMs for better face generation. With FS offering image ratings, we can easily perform preference learning algorithms to refine DMs like SDXL. Comprehensive experiments verify the efficacy of our approach for improving face quality. The code is released at <a class="link-external link-https" href="https://github.com/OPPO-Mente-Lab/FaceScore" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the quality issue of Diffusion Models (DMs) when generating realistic human faces. Although DMs have achieved remarkable success in generating imaginative images based on text descriptions, generating high - quality, realistic human faces with complex details in real - world scenarios remains a challenge. Specifically, the low - quality and unrealistic generated human faces have hindered the wide application of DMs in practice. The paper addresses this problem by evaluating the quality of human faces generated by existing pre - trained DMs and developing a new metric - FaceScore (FS). In addition, the paper also explores how to use FS to improve existing DMs to enhance the quality of human face generation. ### Main Contributions 1. **First Investigation of the Human Face Quality Problem in DMs**: Systematically evaluated a series of metrics for quantifying the quality of human faces in synthetic images. 2. **Proposing FaceScore (FS)**: Developed a reliable new metric for evaluating the rationality and aesthetic appeal of generated human faces and demonstrated that it outperforms existing metrics in performance. 3. **Using FS for Preference Learning**: Verified the effectiveness of FS in improving the quality of human faces through the Direct Preference Optimization (DPO) method. ### Method Overview - **Dataset Construction**: Automatically constructed pairs of human face image datasets by using the inpainting ability of pre - trained DMs, where each pair of images contains an original image and a repaired low - quality human face image. - **Model Fine - Tuning**: Based on the constructed dataset, fine - tuned a scoring model based on the BLIP architecture to generate FaceScore. - **Evaluation and Comparison**: Verified the superiority of FS in human face quality evaluation by comparing with existing metrics such as SER - FIQ, HPS, etc. - **Preference Learning**: Used FS for preference learning and fine - tuned existing DMs through the DPO method to improve the quality of generated human faces. ### Experimental Results - **Quantitative Evaluation**: FS performs better than existing metrics in human face quality evaluation and is highly consistent with human preferences. - **Qualitative Analysis**: Through visual comparison, it is shown that the quality of human faces generated by the model fine - tuned with FS is significantly better than that of the baseline model. - **Human Evaluation**: Through preference studies by human annotators, further verified the advantage of the model fine - tuned with FS in human face quality. ### Conclusion The paper has successfully solved the problem of DMs in generating high - quality human faces, proposed a new metric FaceScore, and improved the quality of generated human faces through the preference learning method. These results provide important references and tools for future research and applications.

FaceScore: Benchmarking and Enhancing Face Quality in Human Generation

FaceChain: A Playground for Identity-Preserving Portrait Generation

Face Image Quality Assessment for Model and Human Perception

Human Preference Score: Better Aligning Text-to-Image Models with Human Preference

FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

Improving face generation quality and prompt following with synthetic captions

OSDFace: One-Step Diffusion Model for Face Restoration

FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance

FacEnhance: Facial Expression Enhancing with Recurrent DDPMs

Face Image Quality Assessment Based on Learning to Rank

DCFace: Synthetic Face Generation with Dual Condition Diffusion Model

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

Face beautification: Beyond makeup transfer

Automatic Face Image Quality Prediction

GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning

Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution

AnyFace: Free-style Text-to-Face Synthesis and Manipulation

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

PaintHuman: Towards High-fidelity Text-to-3D Human Texturing via Denoised Score Distillation