Score Normalization for Demographic Fairness in Face Recognition

Yu Linghu,Tiago de Freitas Pereira,Christophe Ecabert,Sébastien Marcel,Manuel Günther
2024-07-22
Abstract:Fair biometric algorithms have similar verification performance across different demographic groups given a single decision threshold. Unfortunately, for state-of-the-art face recognition networks, score distributions differ between demographics. Contrary to work that tries to align those distributions by extra training or fine-tuning, we solely focus on score post-processing methods. As proved, well-known sample-centered score normalization techniques, Z-norm and T-norm, do not improve fairness for high-security operating points. Thus, we extend the standard Z/T-norm to integrate demographic information in normalization. Additionally, we investigate several possibilities to incorporate cohort similarities for both genuine and impostor pairs per demographic to improve fairness across different operating points. We run experiments on two datasets with different demographics (gender and ethnicity) and show that our techniques generally improve the overall fairness of five state-of-the-art pre-trained face recognition networks, without downgrading verification performance. We also indicate that an equal contribution of False Match Rate (FMR) and False Non-Match Rate (FNMR) in fairness evaluation is required for the highest gains. Code and protocols are available.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to address the issue of insufficient fairness in face recognition systems among different demographic groups (such as gender and race). Specifically, existing face recognition algorithms exhibit different verification performances in different demographic groups, even when using the same decision threshold. This difference mainly stems from the different score distributions among different demographic groups. ### Specific manifestations of the problem 1. **Score distribution differences**: For state - of - the - art facial recognition networks, there are significant differences in the similarity score distributions of different demographic groups. For example, matching comparisons for people of African descent usually result in lower similarity scores, while the opposite is true for white groups. 2. **Single - threshold problem**: Due to the different score distributions of different demographic groups, using a single threshold will lead to large differences in verification performance among different groups. For example, the False Match Rate (FMR) and the False Non - Match Rate (FNMR) perform inconsistently in different groups. 3. **Limitations of existing methods**: Some studies attempt to align these score distributions through additional training or fine - tuning, but these methods usually require retraining the model, increasing complexity and cost. ### Solutions in the paper To solve the above problems, this paper proposes a method based on score post - processing instead of relying on additional training or fine - tuning. Specifically: 1. **Extending the standard Z/T - norm methods**: The paper extends the traditional Z - norm and T - norm methods, integrating demographic information into the normalization process. These methods make the FMR and FNMR of different demographic groups more consistent by adjusting the score distributions. 2. **Introducing new normalization techniques**: The paper proposes several new score normalization techniques, including: - **Impostor Norm**: Normalize using only impostor scores. - **Platt Scaling**: Use logistic regression to align the score distributions of different groups. - **Bimodal CDF**: Normalize by combining the cumulative distribution functions of true matches and impostors. 3. **Experimental verification**: The paper conducts experiments on two datasets, evaluating them for gender and race respectively. The results show that these new methods significantly improve the fairness of the system without degrading the verification performance. ### Main contributions 1. Propose score normalization methods without additional training to improve the fairness of face recognition systems. 2. Extend the Z/T - norm methods to enable them to integrate demographic information and propose three queue - based methods. 3. Develop a new protocol for the RFW dataset and define the queues for the original and new protocols. 4. Study the relative contributions of FNMR and FMR in fairness evaluation. Through these methods, the paper aims to make face recognition systems perform more consistently among different demographic groups, thereby improving the overall fairness of the system.