SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes

Georgia Baltsou,Ioannis Sarridis,Christos Koutlis,Symeon Papadopoulos

2024-04-29

Abstract:AI systems rely on extensive training on large datasets to address various tasks. However, image-based systems, particularly those used for demographic attribute prediction, face significant challenges. Many current face image datasets primarily focus on demographic factors such as age, gender, and skin tone, overlooking other crucial facial attributes like hairstyle and accessories. This narrow focus limits the diversity of the data and consequently the robustness of AI systems trained on them. This work aims to address this limitation by proposing a methodology for generating synthetic face image datasets that capture a broader spectrum of facial diversity. Specifically, our approach integrates a systematic prompt formulation strategy, encompassing not only demographics and biometrics but also non-permanent traits like make-up, hairstyle, and accessories. These prompts guide a state-of-the-art text-to-image model in generating a comprehensive dataset of high-quality realistic images and can be used as an evaluation set in face analysis systems. Compared to existing datasets, our proposed dataset proves equally or more challenging in image classification tasks while being much smaller in size.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem this paper attempts to address is the lack of diversity in existing facial image datasets, particularly the neglect of non-permanent features such as hairstyles, makeup, accessories, etc. Most existing facial image datasets primarily focus on demographic factors like age, gender, and skin color, while ignoring other important facial attributes. This limits the diversity of the data and the robustness of AI systems trained on these datasets. To tackle this challenge, the paper proposes a method for generating a synthetic facial image dataset aimed at covering a broader range of facial diversity. Specifically, the method integrates a systematic prompting strategy that includes not only demographic and biometric features but also non-permanent features such as makeup, hairstyles, and accessories. These prompts guide state-of-the-art text-to-image models to generate a comprehensive dataset containing high-quality realistic images, which can be used for the evaluation of facial analysis systems. Compared to existing datasets, this dataset demonstrates equal or higher challenge in image classification tasks while being smaller in size. The dataset generated through this method—SDFD (Synthetic Diverse Face Dataset)—contains 1,000 different facial images, showcasing people of different races, genders, and ages, wearing various accessories, different types of makeup, and expressing various emotions. Despite its relatively small size, this dataset captures a wide range of different attributes, making it a challenging test set.

SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes

SynFace: Face Recognition with Synthetic Data

Synthetic Data for the Mitigation of Demographic Biases in Face Recognition

Face Recognition Using Synthetic Face Data

SDFR: Synthetic Data for Face Recognition Competition

VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition

Synthetic Counterfactual Faces

Diversity in Faces

Study on the Generation and Comparative Analysis of Ethnically Diverse Faces for Developing a Multiracial Face Recognition Model

Training Deep Face Recognition Systems with Synthetic Data

Synthetic Data for Face Recognition: Current State and Future Prospects

Bias and Diversity in Synthetic-based Face Recognition

Can Synthetic Faces Undo the Damage of Dataset Bias to Face Recognition and Facial Landmark Detection?

Domain-Specific Face Synthesis for Video Face Recognition from a Single Sample Per Person

SIG: A Synthetic Identity Generation Pipeline for Generating Evaluation Datasets for Face Recognition

The Impact of Balancing Real and Synthetic Data on Accuracy and Fairness in Face Recognition

If It's Not Enough, Make It So: Reducing Authentic Data Demand in Face Recognition through Synthetic Faces

GANDiffFace: Controllable Generation of Synthetic Datasets for Face Recognition with Realistic Variations

Balanced Face Dataset: Guiding StyleGAN to Generate Labeled Synthetic Face Image Dataset for Underrepresented Group

Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models

AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark