How Knowledge Distillation Mitigates the Synthetic Gap in Fair Face Recognition

Pedro C. Neto,Ivona Colakovic,Sašo Karakatič,Ana F. Sequeira
2024-08-31
Abstract:Leveraging the capabilities of Knowledge Distillation (KD) strategies, we devise a strategy to fight the recent retraction of face recognition datasets. Given a pretrained Teacher model trained on a real dataset, we show that carefully utilising synthetic datasets, or a mix between real and synthetic datasets to distil knowledge from this teacher to smaller students can yield surprising results. In this sense, we trained 33 different models with and without KD, on different datasets, with different architectures and losses. And our findings are consistent, using KD leads to performance gains across all ethnicities and decreased bias. In addition, it helps to mitigate the performance gap between real and synthetic datasets. This approach addresses the limitations of synthetic data training, improving both the accuracy and fairness of face recognition models.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the following two key problems: 1. **The performance gap problem of synthetic data in face recognition**: - In recent years, due to privacy and ethical issues, many real - face - recognition data sets have been withdrawn. This has led to a shortage of training data, prompting researchers to turn to using synthetic data for model training. - However, the performance of face - recognition models trained with synthetic data on the test set is usually lower than that of models trained with real data, and may increase the model's bias. 2. **How to use Knowledge Distillation (KD) to improve the performance and fairness of models trained with synthetic data**: - Knowledge Distillation is a technique that transfers the knowledge of a large pre - trained model (teacher model) to a small model (student model). Through this method, even a student model trained with synthetic data can benefit from the teacher model, thereby improving its performance and fairness. - The paper explores how to improve the accuracy and fairness of face - recognition models trained with synthetic data without relying on a large amount of real data through KD technology. ### Specific research objectives - **Evaluate the impact of KD on models trained with synthetic data**: Research whether KD can significantly improve the performance of models trained only with synthetic data. - **The effect of mixed data**: Research whether it is beneficial to mix synthetic data and real data when some real data are missing. - **Fairness evaluation**: Verify whether the KD strategy can reduce the model's bias, especially the performance differences between different ethnic groups. ### Main contributions - Proposed a method of combining multiple synthetic data sets and sampling based on ethnic balance. - Explored the KD effects under different architectures, loss functions, training data sets and test sets. - Verified the effectiveness of KD in improving model fairness and reducing bias. - Found that models trained entirely on synthetic data are most affected by the KD strategy, but their performance and fairness can still be significantly improved through KD. Through these studies, the paper provides new insights into how to train more accurate and fairer face - recognition models using synthetic data and Knowledge Distillation technology in the absence of real data.