Abstract:Fair biometric algorithms have similar verification performance across different demographic groups given a single decision threshold. Unfortunately, for state-of-the-art face recognition networks, score distributions differ between demographics. Contrary to work that tries to align those distributions by extra training or fine-tuning, we solely focus on score post-processing methods. As proved, well-known sample-centered score normalization techniques, Z-norm and T-norm, do not improve fairness for high-security operating points. Thus, we extend the standard Z/T-norm to integrate demographic information in normalization. Additionally, we investigate several possibilities to incorporate cohort similarities for both genuine and impostor pairs per demographic to improve fairness across different operating points. We run experiments on two datasets with different demographics (gender and ethnicity) and show that our techniques generally improve the overall fairness of five state-of-the-art pre-trained face recognition networks, without downgrading verification performance. We also indicate that an equal contribution of False Match Rate (FMR) and False Non-Match Rate (FNMR) in fairness evaluation is required for the highest gains. Code and protocols are available.

What problem does this paper attempt to address?

This paper attempts to address the issue of insufficient fairness in face recognition systems among different demographic groups (such as gender and race). Specifically, existing face recognition algorithms exhibit different verification performances in different demographic groups, even when using the same decision threshold. This difference mainly stems from the different score distributions among different demographic groups. ### Specific manifestations of the problem 1. **Score distribution differences**: For state - of - the - art facial recognition networks, there are significant differences in the similarity score distributions of different demographic groups. For example, matching comparisons for people of African descent usually result in lower similarity scores, while the opposite is true for white groups. 2. **Single - threshold problem**: Due to the different score distributions of different demographic groups, using a single threshold will lead to large differences in verification performance among different groups. For example, the False Match Rate (FMR) and the False Non - Match Rate (FNMR) perform inconsistently in different groups. 3. **Limitations of existing methods**: Some studies attempt to align these score distributions through additional training or fine - tuning, but these methods usually require retraining the model, increasing complexity and cost. ### Solutions in the paper To solve the above problems, this paper proposes a method based on score post - processing instead of relying on additional training or fine - tuning. Specifically: 1. **Extending the standard Z/T - norm methods**: The paper extends the traditional Z - norm and T - norm methods, integrating demographic information into the normalization process. These methods make the FMR and FNMR of different demographic groups more consistent by adjusting the score distributions. 2. **Introducing new normalization techniques**: The paper proposes several new score normalization techniques, including: - **Impostor Norm**: Normalize using only impostor scores. - **Platt Scaling**: Use logistic regression to align the score distributions of different groups. - **Bimodal CDF**: Normalize by combining the cumulative distribution functions of true matches and impostors. 3. **Experimental verification**: The paper conducts experiments on two datasets, evaluating them for gender and race respectively. The results show that these new methods significantly improve the fairness of the system without degrading the verification performance. ### Main contributions 1. Propose score normalization methods without additional training to improve the fairness of face recognition systems. 2. Extend the Z/T - norm methods to enable them to integrate demographic information and propose three queue - based methods. 3. Develop a new protocol for the RFW dataset and define the queues for the original and new protocols. 4. Study the relative contributions of FNMR and FMR in fairness evaluation. Through these methods, the paper aims to make face recognition systems perform more consistently among different demographic groups, thereby improving the overall fairness of the system.

Score Normalization for Demographic Fairness in Face Recognition

Fairness Index Measures to Evaluate Bias in Biometric Recognition

Evaluating Proposed Fairness Models for Face Recognition Algorithms

On the Potential of Algorithm Fusion for Demographic Bias Mitigation in Face Recognition

Fairness Under Cover: Evaluating the Impact of Occlusions on Demographic Bias in Facial Recognition

Using score normalization to solve the score variation problem in face authentication

The More Secure, The Less Equally Usable: Gender and Ethnicity (Un)fairness of Deep Face Recognition along Security Thresholds

Fairness measures for biometric quality assessment

Toward Fairer Face Recognition Datasets

Normalise for Fairness: A Simple Normalisation Technique for Fairness in Regression Machine Learning Problems

The Impact of Balancing Real and Synthetic Data on Accuracy and Fairness in Face Recognition

Fairer Analysis and Demographically Balanced Face Generation for Fairer Face Verification

Testing the Performance of Face Recognition for People with Down Syndrome

LabellessFace: Fair Metric Learning for Face Recognition without Attribute Labels

Exploring Causes of Demographic Variations In Face Recognition Accuracy

(Un)fair Exposure in Deep Face Rankings at a Distance

Enhancing Recognition in Multimodal Biometric Systems: Score Normalization and Fusion of Online Signatures and Fingerprints

Addressing Racial Bias in Facial Emotion Recognition

FineFACE: Fair Facial Attribute Classification Leveraging Fine-grained Features

What Should Be Balanced in a "Balanced" Face Recognition Dataset?