Abstract:Anti-discrimination is an increasingly important task in data science. In this paper, we investigate the problem of discovering both direct and indirect discrimination from the historical data, and removing the discriminatory effects before the data is used for predictive analysis (e.g., building classifiers). We make use of the causal network to capture the causal structure of the data. Then we model direct and indirect discrimination as the path-specific effects, which explicitly distinguish the two types of discrimination as the causal effects transmitted along different paths in the network. Based on that, we propose an effective algorithm for discovering direct and indirect discrimination, as well as an algorithm for precisely removing both types of discrimination while retaining good data utility. Different from previous works, our approaches can ensure that the predictive models built from the modified data will not incur discrimination in decision making. Experiments using real datasets show the effectiveness of our approaches.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to discover and remove direct and indirect discrimination from historical data to ensure that no discriminatory decisions are made when conducting predictive analysis (such as building classifiers). Specifically, the authors use causal networks to capture the causal structure of data and model direct and indirect discrimination as path - specific effects, thereby clearly distinguishing these two types of discrimination. ### Main problem description: 1. **Discovering discrimination**: - **Direct discrimination**: When an individual is treated unfavorably because they belong to a protected attribute (such as gender, race, etc.). - **Indirect discrimination**: When a decision is based on a seemingly neutral non - protected attribute (such as postal code), but these attributes are associated with protected attributes, leading to unfair treatment of protected groups. 2. **Removing discrimination**: - While removing discrimination, maintain the validity of the data (i.e., the predictive ability of the data is not significantly lost). - Ensure that the prediction model built based on the modified data does not generate discrimination in the decision - making process. ### Method overview: - **Causal network modeling**: Use a causal network (DAG) to represent the causal relationships in the data, where each node represents an attribute and each edge represents a causal relationship. - **Path - specific effects**: Model direct and indirect discrimination as specific effects on causal paths, and measure and identify discrimination through path - specific effects (SEπd and SEπi). - **Algorithm design**: - **Discovery algorithm (PSE - DD)**: Judge whether there is direct or indirect discrimination by calculating path - specific effects. - **Removal algorithm (PSE - DR)**: Remove discrimination by modifying the conditional probability table (CPT) in the causal network while minimizing the impact on data utility. ### Formula presentation: 1. **Path - specific effects**: - Path - specific effect of direct discrimination: \[ S E_{\pi_d}(c^+, c^-) = P(e^+ | do(c^+ | \pi_d)) - P(e^+ | do(c^-)) \] - Path - specific effect of indirect discrimination: \[ S E_{\pi_i}(c^+, c^-) = P(e^+ | do(c^+ | \pi_i)) - P(e^+ | do(c^-)) \] 2. **Optimization problem for removing discrimination**: - Minimize the difference in the joint distribution between the original causal network and the modified causal network to ensure that the discrimination effect is below the threshold τ: \[ \minimize \sum_V (P'(v) - P(v))^2 \] \[ \text{subject to } S E_{\pi_d}(c^+, c^-) \leq \tau, \quad S E_{\pi_d}(c^-, c^+) \leq \tau, \] \[ S E_{\pi_i}(c^+, c^-) \leq \tau, \quad S E_{\pi_i}(c^-, c^+) \leq \tau, \] \[ \forall Pa(E), \quad P'(e^- | Pa(E)) + P'(e^+ | Pa(E)) = 1, \] \[ \forall Pa(E), e, \quad P'(e | Pa(E)) \geq 0 \] ### Experimental verification: The authors used two real - world datasets (the Adult dataset and the Dutch dataset) for experiments to verify the effectiveness of the proposed algorithms in discovering and removing discrimination. The experimental results show that the PSE - DD and PSE - DR algorithms can effectively identify and eliminate direct and indirect discrimination while maintaining high data utility. In conclusion, this paper proposes a method based on causal networks.

A causal framework for discovering and removing direct and indirect discrimination

On Discrimination Discovery and Removal in Ranked Data using Causal Graph

A Causal Framework for Observational Studies of Discrimination

Local Causal Discovery for Structural Evidence of Direct Discrimination

Achieving non-discrimination in prediction

A survey on measuring indirect discrimination in machine learning

Multi-cause Discrimination Analysis Using Potential Outcomes.

Causal Discovery for Fairness

A Causal Framework to Evaluate Racial Bias in Law Enforcement Systems

Explaining Algorithmic Fairness Through Fairness-Aware Causal Path Decomposition

Proxy Non-Discrimination in Data-Driven Systems

Practical Guide for Causal Pathways and Sub-group Disparity Analysis

Detecting and Mitigating Algorithmic Bias in Binary Classification using Causal Modeling

Mitigating bias in artificial intelligence: Fair data generation via causal models for transparent and explainable decision-making

Conscientious Classification: A Data Scientist's Guide to Discrimination-Aware Classification

Modeling and Discovering Direct Causes for Predictive Models

Data Management for Causal Algorithmic Fairness

Algorithmic discrimination: examining its types and regulatory measures with emphasis on US legal practices

A Case Study on a Sustainable Framework for Ethically Aware Predictive Modeling

Big Data, Data Science, and Civil Rights

Why Is My Classifier Discriminatory?