Abstract:There are several algorithms for measuring fairness of ML models. A fundamental assumption in these approaches is that the ground truth is fair or unbiased. In real-world datasets, however, the ground truth often contains data that is a result of historical and societal biases and discrimination. Models trained on these datasets will inherit and propagate the biases to the model outputs. We propose FAIRLABEL, an algorithm which detects and corrects biases in labels. The goal of FAIRLABELis to reduce the Disparate Impact (DI) across groups while maintaining high accuracy in predictions. We propose metrics to measure the quality of bias correction and validate FAIRLABEL on synthetic datasets and show that the label correction is correct 86.7% of the time vs. 71.9% for a baseline model. We also apply FAIRLABEL on benchmark datasets such as UCI Adult, German Credit Risk, and Compas datasets and show that the Disparate Impact Ratio increases by as much as 54.2%.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the label bias problem in machine learning (ML) models. Specifically, the paper proposes an algorithm named **FAIRLABEL** for detecting and correcting label biases in datasets. The following are the main problems that the paper attempts to solve: 1. **Label Bias Problem**: - In real - world datasets, the so - called "true labels" often contain historical and social biases, which can lead to unfair decisions. - If these biases are not dealt with, the trained ML models will inherit and propagate these biases, resulting in unfair outcomes. 2. **Disparate Impact (DI) Problem**: - The paper pays special attention to how to reduce the disparate impact among different groups while maintaining high prediction accuracy. - Disparate Impact refers to the degree to which a specific group is significantly more adversely affected in model predictions than other groups. For example, in decision - making such as recruitment, loan applications, and criminal justice, minority groups may be more likely to be adversely affected. 3. **Impracticability of Manual Label Correction**: - Manually re - evaluating and correcting labels in historical decisions is impractical because detailed historical data and information are usually no longer available. - Therefore, an automated method is required to correct the biases in these labels. 4. **Limitations of Existing Methods**: - Most existing fairness algorithms assume that the true labels are fair or unbiased, but this is not the case in reality. - Most research focuses on feature selection, model adjustment, etc., and less attention is paid to the bias of the labels themselves. ### FAIRLABEL's Solutions To address the above problems, FAIRLABEL proposes the following solutions: - **Label Correction Algorithm**: By detecting and correcting biases in labels, reduce the disparate impact among different groups. - **Synthetic Data Generation Framework**: Used to verify the effectiveness of FAIRLABEL by injecting biases into synthetic data and measuring the algorithm's correction ability. - **Benchmark Dataset Experiments**: Verify the performance of FAIRLABEL on multiple public datasets (such as UCI Adult, German Credit Risk, and Compas) and show its significant improvement in reducing disparate impact. Through these methods, FAIRLABEL aims to provide an effective and scalable solution to reduce label biases in machine learning models, thereby improving the fairness of decision - making.

FAIRLABEL: Correcting Bias in Labels

Bias-Tolerant Fair Classification

How to be fair? A study of label and selection bias

On Comparing Fair Classifiers under Data Bias

Bias in Machine Learning Software: Why? How? What to do?

Simultaneous Improvement of ML Model Fairness and Performance by Identifying Bias in Data

Fair-OBNC: Correcting Label Noise for Fairer Datasets

Quantifying and mitigating the impact of label errors on model disparity metrics

Who Decides if AI is Fair? The Labels Problem in Algorithmic Auditing

Investigating Labeler Bias in Face Annotation for Machine Learning

Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

Normalise for Fairness: A Simple Normalisation Technique for Fairness in Regression Machine Learning Problems

AIM: Attributing, Interpreting, Mitigating Data Unfairness

Fix Fairness, Don't Ruin Accuracy: Performance Aware Fairness Repair using AutoML

Fairness Evaluation in Presence of Biased Noisy Labels

Towards A Holistic View of Bias in Machine Learning: Bridging Algorithmic Fairness and Imbalanced Learning

Interactive Active Learning for Fairness with Partial Group Label

Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking

Editable Fairness: Fine-Grained Bias Mitigation in Language Models

fairmodels: A Flexible Tool For Bias Detection, Visualization, And Mitigation

FLAC: Fairness-Aware Representation Learning by Suppressing Attribute-Class Associations