GANDiffFace: Controllable Generation of Synthetic Datasets for Face Recognition with Realistic Variations

Pietro Melzi,Christian Rathgeb,Ruben Tolosana,Ruben Vera-Rodriguez,Dominik Lawatsch,Florian Domin,Maxim Schaubert

2023-05-31

Abstract:Face recognition systems have significantly advanced in recent years, driven by the availability of large-scale datasets. However, several issues have recently came up, including privacy concerns that have led to the discontinuation of well-established public datasets. Synthetic datasets have emerged as a solution, even though current synthesis methods present other drawbacks such as limited intra-class variations, lack of realism, and unfair representation of demographic groups. This study introduces GANDiffFace, a novel framework for the generation of synthetic datasets for face recognition that combines the power of Generative Adversarial Networks (GANs) and Diffusion models to overcome the limitations of existing synthetic datasets. In GANDiffFace, we first propose the use of GANs to synthesize highly realistic identities and meet target demographic distributions. Subsequently, we fine-tune Diffusion models with the images generated with GANs, synthesizing multiple images of the same identity with a variety of accessories, poses, expressions, and contexts. We generate multiple synthetic datasets by changing GANDiffFace settings, and compare their mated and non-mated score distributions with the distributions provided by popular real-world datasets for face recognition, i.e. VGG2 and IJB-C. Our results show the feasibility of the proposed GANDiffFace, in particular the use of Diffusion models to enhance the (limited) intra-class variations provided by GANs towards the level of real-world datasets.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address several key issues present in existing synthetic datasets in facial recognition systems, including: 1. **Privacy Issues**: Large publicly available facial datasets have been discontinued due to privacy concerns, limiting the development of facial recognition technology. 2. **Limited Intra-Class Variation**: Current synthetic methods generate images with insufficient variation between different images of the same individual (i.e., limited intra-class variation), which affects the performance of facial recognition models, especially when training with synthetic data and evaluating with real data. 3. **Lack of Realism**: Synthetic images lack realism and cannot fully simulate real-world scenarios. 4. **Demographic Representation Bias**: The demographic characteristics (such as race, gender, age, etc.) in existing synthetic datasets are unevenly represented, leading to potential poor performance of models on certain groups. To overcome these issues, the paper proposes a new framework called GANDiffFace, which combines the strengths of Generative Adversarial Networks (GANs) and Diffusion Models. This framework aims to generate synthetic datasets with high realism and rich intra-class variation, while also being able to control the target demographic distribution.

GANDiffFace: Controllable Generation of Synthetic Datasets for Face Recognition with Realistic Variations

Synthetic Data for the Mitigation of Demographic Biases in Face Recognition

On the use of automatically generated synthetic image datasets for benchmarking face recognition

VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition

Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion

SDFR: Synthetic Data for Face Recognition Competition

Identity-driven Three-Player Generative Adversarial Network for Synthetic-based Face Recognition

The Impact of Balancing Real and Synthetic Data on Accuracy and Fairness in Face Recognition

Generation of Non-Deterministic Synthetic Face Datasets Guided by Identity Priors

Face Recognition Using Synthetic Face Data

TCDiff: Triple Condition Diffusion Model with 3D Constraints for Stylizing Synthetic Faces

SynFace: Face Recognition with Synthetic Data

GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning

How to Boost Face Recognition with StyleGAN?

Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models

DCFace: Synthetic Face Generation with Dual Condition Diffusion Model

Synthetic Data for Face Recognition: Current State and Future Prospects

Second FRCSyn-onGoing: Winning Solutions and Post-Challenge Analysis to Improve Face Recognition with Synthetic Data

ExFaceGAN: Exploring Identity Directions in GAN's Learned Latent Space for Synthetic Identity Generation