Mohsen Ghassemi,Alan Mishler,Niccolo Dalmasso,Luhao Zhang,Vamsi K. Potluru,Tucker Balch,Manuela Veloso
Abstract:Conditional demographic parity (CDP) is a measure of the demographic parity of a predictive model or decision process when conditioning on an additional feature or set of features. Many algorithmic fairness techniques exist to target demographic parity, but CDP is much harder to achieve, particularly when the conditioning variable has many levels and/or when the model outputs are continuous. The problem of auditing and enforcing CDP is understudied in the literature. In light of this, we propose novel measures of {conditional demographic disparity (CDD)} which rely on statistical distances borrowed from the optimal transport literature. We further design and evaluate regularization-based approaches based on these CDD measures. Our methods, \fairbit{} and \fairlp{}, allow us to target CDP even when the conditioning variable has many levels. When model outputs are continuous, our methods target full equality of the conditional distributions, unlike other methods that only consider first moments or related proxy quantities. We validate the efficacy of our approaches on real-world datasets.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to conduct effective auditing and implementation in terms of Conditional Demographic Parity (CDP), especially when the legitimate features have multiple levels or the model output is a continuous value. Specifically, the paper proposes a new method to measure and minimize Conditional Demographic Disparity (CDD), and designs regularization - based strategies to achieve this goal.
### Problem Background
Algorithmic decision - making has an increasingly large impact on personal lives in fields such as finance, healthcare, and recruitment, so algorithmic fairness has become an important research area. Early work on algorithmic fairness mainly focused on Demographic Parity (DP), which requires that the output of a model or decision - making process be statistically independent of sensitive features (such as race, gender, disability status, etc.). However, DP may lead to intuitively unfair behaviors. For example, in the loan approval process, although the overall approval rate is the same, men are more likely to be approved than women at each income level.
### Conditional Demographic Fairness (CDP)
To consider fairness more meticulously, Conditional Demographic Fairness (CDP) was introduced. CDP requires that, given certain legitimate or explanatory features (such as income level), the model output remains independent of sensitive features. However, when the legitimate features have multiple levels or the model output is a continuous value, achieving CDP becomes very difficult. Existing methods either cannot handle this situation or are not very effective.
### Main Contributions of the Paper
1. **Proposing a New CDD Metric**:
- Two new general CDD metrics are introduced: CDD in the Wasserstein sense and CDD in the ℓp sense.
- These metrics consider the entire conditional distribution and are applicable to classification and regression tasks.
2. **Regularization - Based Strategies**:
- Two methods, FairBiT and FairLeap, are proposed, targeting CDD in the Wasserstein sense and CDD in the ℓp sense respectively.
- FairBiT utilizes the bi - causal transport distance to minimize CDD.
- FairLeap aggregates differences at different levels through a weighted ℓp - norm.
3. **Practical Applications and Experimental Verification**:
- The effectiveness of these methods has been verified on real - world datasets.
- A tunable parameter is provided, allowing users to make a trade - off between fairness and performance.
### Formula Representation
- **CDD in the Wasserstein Sense**:
\[
\text{CDD}_{\text{wass}}(f) := W_p^p(P(L|A = 0), P(L|A = 1); D)
\]
where \( W_p \) is the p - Wasserstein distance, and \( D(l, l') = d(P(f(X)|L = l, A = 0), P(f(X)|L = l', A = 1)) \).
- **CDD in the ℓp Sense**:
\[
\text{CDD}_{\ell p}(f) := \|D\|_{\ell p}(L; Q(L))
\]
where \( D(l) = d(P(f(X)|L = l, A = 0), P(f(X)|L = l, A = 1)) \) and \( Q(L) \) is a probability measure defined on \( L \).
### Summary
By introducing new CDD metrics and regularization - based strategies, this paper overcomes the shortcomings of existing methods in dealing with complex legitimate features and continuous outputs, providing an effective method for achieving Conditional Demographic Fairness.