FaceScore: Benchmarking and Enhancing Face Quality in Human Generation

Zhenyi Liao,Qingsong Xie,Chen Chen,Hannan Lu,Zhijie Deng
2024-09-12
Abstract:Diffusion models (DMs) have achieved significant success in generating imaginative images given textual descriptions. However, they are likely to fall short when it comes to real-life scenarios with intricate details. The low-quality, unrealistic human faces in text-to-image generation are one of the most prominent issues, hindering the wide application of DMs in practice. Targeting addressing such an issue, we first assess the face quality of generations from popular pre-trained DMs with the aid of human annotators and then evaluate the alignment between existing metrics with human judgments. Observing that existing metrics can be unsatisfactory for quantifying face quality, we develop a novel metric named FaceScore (FS) by fine-tuning the widely used ImageReward on a dataset of (win, loss) face pairs cheaply crafted by an inpainting pipeline of DMs. Extensive studies reveal FS enjoys a superior alignment with humans. On the other hand, FS opens up the door for enhancing DMs for better face generation. With FS offering image ratings, we can easily perform preference learning algorithms to refine DMs like SDXL. Comprehensive experiments verify the efficacy of our approach for improving face quality. The code is released at <a class="link-external link-https" href="https://github.com/OPPO-Mente-Lab/FaceScore" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the quality issue of Diffusion Models (DMs) when generating realistic human faces. Although DMs have achieved remarkable success in generating imaginative images based on text descriptions, generating high - quality, realistic human faces with complex details in real - world scenarios remains a challenge. Specifically, the low - quality and unrealistic generated human faces have hindered the wide application of DMs in practice. The paper addresses this problem by evaluating the quality of human faces generated by existing pre - trained DMs and developing a new metric - FaceScore (FS). In addition, the paper also explores how to use FS to improve existing DMs to enhance the quality of human face generation. ### Main Contributions 1. **First Investigation of the Human Face Quality Problem in DMs**: Systematically evaluated a series of metrics for quantifying the quality of human faces in synthetic images. 2. **Proposing FaceScore (FS)**: Developed a reliable new metric for evaluating the rationality and aesthetic appeal of generated human faces and demonstrated that it outperforms existing metrics in performance. 3. **Using FS for Preference Learning**: Verified the effectiveness of FS in improving the quality of human faces through the Direct Preference Optimization (DPO) method. ### Method Overview - **Dataset Construction**: Automatically constructed pairs of human face image datasets by using the inpainting ability of pre - trained DMs, where each pair of images contains an original image and a repaired low - quality human face image. - **Model Fine - Tuning**: Based on the constructed dataset, fine - tuned a scoring model based on the BLIP architecture to generate FaceScore. - **Evaluation and Comparison**: Verified the superiority of FS in human face quality evaluation by comparing with existing metrics such as SER - FIQ, HPS, etc. - **Preference Learning**: Used FS for preference learning and fine - tuned existing DMs through the DPO method to improve the quality of generated human faces. ### Experimental Results - **Quantitative Evaluation**: FS performs better than existing metrics in human face quality evaluation and is highly consistent with human preferences. - **Qualitative Analysis**: Through visual comparison, it is shown that the quality of human faces generated by the model fine - tuned with FS is significantly better than that of the baseline model. - **Human Evaluation**: Through preference studies by human annotators, further verified the advantage of the model fine - tuned with FS in human face quality. ### Conclusion The paper has successfully solved the problem of DMs in generating high - quality human faces, proposed a new metric FaceScore, and improved the quality of generated human faces through the preference learning method. These results provide important references and tools for future research and applications.