Abstract:Providing various machine learning (ML) applications in the real world, concerns about discrimination hidden in ML models are growing, particularly in high-stakes domains. Existing techniques for assessing the discrimination level of ML models include commonly used group and individual fairness measures. However, these two types of fairness measures are usually hard to be compatible with each other, and even two different group fairness measures might be incompatible as well. To address this issue, we investigate to evaluate the discrimination level of classifiers from a manifold perspective and propose a "harmonic fairness measure via manifolds (HFM)" based on distances between sets. Yet the direct calculation of distances might be too expensive to afford, reducing its practical applicability. Therefore, we devise an approximation algorithm named "Approximation of distance between sets (ApproxDist)" to facilitate accurate estimation of distances, and we further demonstrate its algorithmic effectiveness under certain reasonable assumptions. Empirical results indicate that the proposed fairness measure HFM is valid and that the proposed ApproxDist is effective and efficient.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate and reduce discriminatory bias in machine - learning models. Specifically, the author focuses on the possible discrimination problems in machine - learning systems deployed in high - risk fields (such as medical care, transportation, recruitment, and justice). Existing fairness evaluation techniques mainly rely on group fairness and individual fairness metrics, but these metrics are usually difficult to satisfy simultaneously, and even different group fairness metrics may be incompatible. To meet this challenge, the paper proposes a "Harmonic Fairness Measure via Manifolds (HFM)" from the manifold perspective, which is used to comprehensively evaluate the discrimination level of classifiers from both individual and group aspects. In addition, since the cost of directly calculating the distance between different sets is high, the paper also proposes an approximation algorithm "Approximation of Distance between Sets (ApproxDist)" to improve computational efficiency.
### Core contributions of the paper:
1. **Propose a new fairness measurement method**: The Harmonic Fairness Measure (HFM) from the manifold perspective can comprehensively evaluate the discrimination level of classifiers from both individual and group aspects.
2. **Design an efficient approximation algorithm**: The Approximation of Distance between Sets (ApproxDist) is used to quickly estimate the distance between different sets, thereby increasing the practical application value of HFM.
3. **Analyze the effectiveness of the approximation algorithm**: Under certain assumptions, the effectiveness of the ApproxDist algorithm is proved and a detailed explanation is provided.
4. **Verify the effectiveness of the method through experiments**: Comprehensive experiments are carried out to show the effectiveness and efficiency of HFM and ApproxDist.
### Formula presentation:
- **Distance between sets**:
\[
D(S_0, S_1) \triangleq \max \left\{ \max_{(x, y) \in S_0} \min_{(x', y') \in S_1} d(\tilde{x}, y), (\tilde{x}', y'), \max_{(x', y') \in S_1} \min_{(x, y) \in S_0} d(\tilde{x}, y), (\tilde{x}', y') \right\}
\]
- **Fairness measurement of the classifier**:
\[
df(f) = \frac{D_f(S_0, S_1)}{D(S_0, S_1)} - 1
\]
- **Projection function**:
\[
g(x, \hat{y}; w) = g(\tilde{x}, a, \hat{y}; w) = [\hat{y}, x_1, \ldots, x_{n_x}]^T w
\]
- **Time complexity of the approximation algorithm**:
\[
O(m_1 n (\log n + m_2))
\]
### Experimental results:
- **RQ1**: Compared with the existing state - of - the - art fairness measurement methods, HFM can more effectively capture the discrimination degree of classifiers and can evaluate the discrimination level from both individual and group aspects.
- **RQ2**: ApproxDist can accurately approximate the directly calculated distance and is significantly more efficient in calculation than the direct calculation method.
- **RQ3**: The selection of hyper - parameters \( m_1 \) and \( m_2 \) will affect the approximation results, but by reasonably selecting these parameters, it can be ensured that the algorithm reaches the approximate solution with high probability.
Through these contributions, the paper provides a new and effective tool for fairness evaluation in machine - learning models and a solution for efficient calculation in practical applications.