Abstract:The field of image synthesis has made great strides in the last couple of years. Recent models are capable of generating images with astonishing quality. Fine-grained evaluation of these models on some interesting categories such as faces is still missing. Here, we conduct a quantitative comparison of three popular systems including Stable Diffusion, Midjourney, and DALL-E 2 in their ability to generate photorealistic faces in the wild. We find that Stable Diffusion generates better faces than the other systems, according to the FID score. We also introduce a dataset of generated faces in the wild dubbed GFW, including a total of 15,076 faces. Furthermore, we hope that our study spurs follow-up research in assessing the generative models and improving them. Data and code are available at data and code, respectively.

What problem does this paper attempt to address?

The paper primarily focuses on comparing the capabilities of three popular image generation models—Stable Diffusion, Midjourney, and DALL·E 2—in synthesizing realistic human faces. Specifically, the study aims to evaluate the quality of faces generated in complex scenes, rather than those optimized specifically for portraits. The authors conducted experiments through the following steps: 1. **Model Selection**: Three models, namely Stable Diffusion, Midjourney, and DALL·E 2, were selected for comparison. 2. **Dataset Construction**: To obtain a dataset for generating human faces, the authors used captions from the COCO dataset as prompts to generate images and detected faces from them. Additionally, they collected real-world face data, including faces from the COCO training set and the Labeled Faces in the Wild (LFW) dataset. 3. **Quality Evaluation**: The Fréchet Inception Distance (FID) score was used as a metric to measure the similarity between the generated faces and real faces. The study found that Stable Diffusion performed the best in terms of the quality of generated faces. According to the FID score, it was more capable of generating realistic faces compared to the other two models. However, despite achieving better results, there remains a significant gap between the generated faces and real faces, indicating substantial room for improvement. Future research directions may include: - Increasing the number of face samples generated by DALL·E 2 for a more comprehensive comparison. - Investigating whether the generation systems exhibit data memorization. - Exploring whether the generated faces have issues of social bias. - Using metrics more suitable for face evaluation (such as SSIM, LPIPS, etc.) for assessment. - Conducting a more detailed analysis of facial features (such as expressions, age, viewpoints, etc.).

Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2

Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2

Single Image, Any Face: Generalisable 3D Face Generation

GANDiffFace: Controllable Generation of Synthetic Datasets for Face Recognition with Realistic Variations

TCDiff: Triple Condition Diffusion Model with 3D Constraints for Stylizing Synthetic Faces

Controllable 3D Face Generation with Conditional Style Code Diffusion

DCFace: Synthetic Face Generation with Dual Condition Diffusion Model

FaceScore: Benchmarking and Enhancing Face Quality in Human Generation

Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model.

Recent Progress of Face Image Synthesis

Towards Realistic Generative 3D Face Models

Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion

AvatarMe++: Facial Shape and BRDF Inference With Photorealistic Rendering-Aware GANs

Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models

DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis

Fake It Without Making It: Conditioned Face Generation for Accurate 3D Face Reconstruction

SynthForge: Synthesizing High-Quality Face Dataset with Controllable 3D Generative Models

AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"

Analyzing the Feature Extractor Networks for Face Image Synthesis

Measuring the Consistency and Diversity of 3D Face Generation