What Should Be Balanced in a "Balanced" Face Recognition Dataset?

Haiyu Wu,Kevin W. Bowyer
2023-08-24
Abstract:The issue of demographic disparities in face recognition accuracy has attracted increasing attention in recent years. Various face image datasets have been proposed as 'fair' or 'balanced' to assess the accuracy of face recognition algorithms across demographics. These datasets typically balance the number of identities and images across demographics. It is important to note that the number of identities and images in an evaluation dataset are {\em not} driving factors for 1-to-1 face matching accuracy. Moreover, balancing the number of identities and images does not ensure balance in other factors known to impact accuracy, such as head pose, brightness, and image quality. We demonstrate these issues using several recently proposed datasets. To improve the ability to perform less biased evaluations, we propose a bias-aware toolkit that facilitates creation of cross-demographic evaluation datasets balanced on factors mentioned in this paper.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to address the issue of differences in facial recognition accuracy among different populations. Specifically, the researchers believe that merely balancing the number of people and images in a dataset is not sufficient to ensure a fair evaluation of facial recognition algorithms' performance. Instead, they propose the need to balance factors known to affect facial recognition accuracy, such as image quality, head pose, and lighting. The main contributions include: 1. **Revealing the Problem**: Demonstrating that some datasets previously considered "fair" or "balanced" are not balanced in key factors that are crucial for accuracy. 2. **Proposing a New Dataset**: Introducing a new test dataset called BA-test, which is balanced across multiple factors affecting accuracy, thereby supporting fair evaluation across different populations. 3. **Providing a Toolkit**: Offering a toolkit (BA-toolkit) for balancing various factors in datasets, such as lighting, head pose, image quality, and visible facial area. 4. **Benchmarking**: Through benchmarking, it was found that the current state-of-the-art models have the lowest accuracy on Asian females and the highest accuracy on white males, revealing accuracy disparities across gender, age, and race. Through these methods, the researchers hope to improve the fairness and reliability of facial recognition technology in practical applications.