GANDiffFace: Controllable Generation of Synthetic Datasets for Face Recognition with Realistic Variations

Pietro Melzi,Christian Rathgeb,Ruben Tolosana,Ruben Vera-Rodriguez,Dominik Lawatsch,Florian Domin,Maxim Schaubert
2023-05-31
Abstract:Face recognition systems have significantly advanced in recent years, driven by the availability of large-scale datasets. However, several issues have recently came up, including privacy concerns that have led to the discontinuation of well-established public datasets. Synthetic datasets have emerged as a solution, even though current synthesis methods present other drawbacks such as limited intra-class variations, lack of realism, and unfair representation of demographic groups. This study introduces GANDiffFace, a novel framework for the generation of synthetic datasets for face recognition that combines the power of Generative Adversarial Networks (GANs) and Diffusion models to overcome the limitations of existing synthetic datasets. In GANDiffFace, we first propose the use of GANs to synthesize highly realistic identities and meet target demographic distributions. Subsequently, we fine-tune Diffusion models with the images generated with GANs, synthesizing multiple images of the same identity with a variety of accessories, poses, expressions, and contexts. We generate multiple synthetic datasets by changing GANDiffFace settings, and compare their mated and non-mated score distributions with the distributions provided by popular real-world datasets for face recognition, i.e. VGG2 and IJB-C. Our results show the feasibility of the proposed GANDiffFace, in particular the use of Diffusion models to enhance the (limited) intra-class variations provided by GANs towards the level of real-world datasets.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address several key issues present in existing synthetic datasets in facial recognition systems, including: 1. **Privacy Issues**: Large publicly available facial datasets have been discontinued due to privacy concerns, limiting the development of facial recognition technology. 2. **Limited Intra-Class Variation**: Current synthetic methods generate images with insufficient variation between different images of the same individual (i.e., limited intra-class variation), which affects the performance of facial recognition models, especially when training with synthetic data and evaluating with real data. 3. **Lack of Realism**: Synthetic images lack realism and cannot fully simulate real-world scenarios. 4. **Demographic Representation Bias**: The demographic characteristics (such as race, gender, age, etc.) in existing synthetic datasets are unevenly represented, leading to potential poor performance of models on certain groups. To overcome these issues, the paper proposes a new framework called GANDiffFace, which combines the strengths of Generative Adversarial Networks (GANs) and Diffusion Models. This framework aims to generate synthetic datasets with high realism and rich intra-class variation, while also being able to control the target demographic distribution.