Abstract:Leveraging the capabilities of Knowledge Distillation (KD) strategies, we devise a strategy to fight the recent retraction of face recognition datasets. Given a pretrained Teacher model trained on a real dataset, we show that carefully utilising synthetic datasets, or a mix between real and synthetic datasets to distil knowledge from this teacher to smaller students can yield surprising results. In this sense, we trained 33 different models with and without KD, on different datasets, with different architectures and losses. And our findings are consistent, using KD leads to performance gains across all ethnicities and decreased bias. In addition, it helps to mitigate the performance gap between real and synthetic datasets. This approach addresses the limitations of synthetic data training, improving both the accuracy and fairness of face recognition models.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the following two key problems: 1. **The performance gap problem of synthetic data in face recognition**: - In recent years, due to privacy and ethical issues, many real - face - recognition data sets have been withdrawn. This has led to a shortage of training data, prompting researchers to turn to using synthetic data for model training. - However, the performance of face - recognition models trained with synthetic data on the test set is usually lower than that of models trained with real data, and may increase the model's bias. 2. **How to use Knowledge Distillation (KD) to improve the performance and fairness of models trained with synthetic data**: - Knowledge Distillation is a technique that transfers the knowledge of a large pre - trained model (teacher model) to a small model (student model). Through this method, even a student model trained with synthetic data can benefit from the teacher model, thereby improving its performance and fairness. - The paper explores how to improve the accuracy and fairness of face - recognition models trained with synthetic data without relying on a large amount of real data through KD technology. ### Specific research objectives - **Evaluate the impact of KD on models trained with synthetic data**: Research whether KD can significantly improve the performance of models trained only with synthetic data. - **The effect of mixed data**: Research whether it is beneficial to mix synthetic data and real data when some real data are missing. - **Fairness evaluation**: Verify whether the KD strategy can reduce the model's bias, especially the performance differences between different ethnic groups. ### Main contributions - Proposed a method of combining multiple synthetic data sets and sampling based on ethnic balance. - Explored the KD effects under different architectures, loss functions, training data sets and test sets. - Verified the effectiveness of KD in improving model fairness and reducing bias. - Found that models trained entirely on synthetic data are most affected by the KD strategy, but their performance and fairness can still be significantly improved through KD. Through these studies, the paper provides new insights into how to train more accurate and fairer face - recognition models using synthetic data and Knowledge Distillation technology in the absence of real data.

How Knowledge Distillation Mitigates the Synthetic Gap in Fair Face Recognition

SynthDistill: Face Recognition with Knowledge Distillation from Synthetic Data

ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for Face Recognition

Evaluation-oriented Knowledge Distillation for Deep Face Recognition

AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition

Knowledge Distillation for Face Photo–Sketch Synthesis

MST-KD: Multiple Specialized Teachers Knowledge Distillation for Fair Face Recognition

Small Scale Data-Free Knowledge Distillation

Up to 100x Faster Data-Free Knowledge Distillation

An Embarrassingly Simple Approach for Knowledge Distillation

Is Synthetic Data From Diffusion Models Ready for Knowledge Distillation?

Why does Knowledge Distillation work? Rethink its attention and fidelity mechanism

Comparative Knowledge Distillation

Can Synthetic Faces Undo the Damage of Dataset Bias to Face Recognition and Facial Landmark Detection?

Grouped Knowledge Distillation for Deep Face Recognition

Face Recognition Using Synthetic Face Data

Simplified Knowledge Distillation for Deep Neural Networks Bridging the Performance Gap with a Novel Teacher–Student Architecture

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

SynFace: Face Recognition with Synthetic Data

Improving Knowledge Distillation With a Customized Teacher

A lateral inhibition neural network that emulates a winner-takes-all algorithm