Abstract:We study the canonical fair clustering problem where each cluster is constrained to have close to population-level representation of each group. Despite significant attention, the salient issue of having incomplete knowledge about the group membership of each point has been superficially addressed. In this paper, we consider a setting where errors exist in the assigned group memberships. We introduce a simple and interpretable family of error models that require a small number of parameters to be given by the decision maker. We then present an algorithm for fair clustering with provable robustness guarantees. Our framework enables the decision maker to trade off between the robustness and the clustering quality. Unlike previous work, our algorithms are backed by worst-case theoretical guarantees. Finally, we empirically verify the performance of our algorithm on real world datasets and show its superior performance over existing baselines.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to handle the uncertainty of group membership in fair clustering. Specifically, when the group membership information of each data point is incorrect or incomplete, how to ensure that the clustering results remain fair. ### Problem Background In the traditional fair clustering problem, it is required that the proportion of each group in each cluster is close to the proportion of these groups in the entire dataset. However, in practical applications, the information of group membership may be incomplete, noisy, or even maliciously tampered with. For example, in the advertising placement scenario, group membership may be estimated by a machine - learning model; in the loan approval scenario, the estimation of group membership may be illegal or infeasible. Therefore, how to perform fair clustering in such uncertain situations is an important research problem. ### Main Contributions of the Paper 1. **Introducing New Error Models**: The paper proposes three error models - Bounded Aggregation Error (BAE), Bounded Pairwise Error (BPE), and Bounded Aggregation and Pairwise Error (BAPE). These models allow decision - makers to specify a small number of parameters based on the available information, rather than providing complete probability information for each point. In particular, the BAPE model combines the advantages of aggregation error and pairwise error and provides higher flexibility. 2. **Robust Fair Clustering Algorithm**: Based on these error models, the paper proposes a robust fair clustering algorithm that can ensure the fairness of clustering results in the presence of errors in group membership. This algorithm has a theoretical worst - case guarantee and can be verified in practice to have better performance than existing methods. 3. **Balancing Robustness and Clustering Quality**: The paper introduces a tolerance parameter \(T\), which enables decision - makers to balance between robustness and clustering quality. By adjusting \(T\), appropriate solutions can be flexibly selected in different application scenarios. ### Summary of Mathematical Formulas - **Fairness Constraints**: \[ l_h |C_i| \leq |C_{i,h}| \leq u_h |C_i| \quad \forall i \in S, h \in H \] where \(l_h\) and \(u_h\) are the lower and upper limit proportions of group \(h\) respectively, \(C_i\) is the \(i\)-th cluster, and \(C_{i,h}\) is the set of points belonging to group \(h\) in the \(i\)-th cluster. - **Maximum Fairness Violation**: \[ \Delta(S, M, m, \phi)=\max_{i \in S, h \in H}\left\{\frac{|C_{i,h}|+m \to h - u_h |C_i|}{|C_i|}, \frac{l_h |C_i|-(|C_{i,h}|-m_h \to)}{|C_i|}\right\} \] - **Fairness Constraints under Tolerance Parameter**: \[ (l_h - T)|C_i| \leq |\hat{C}_{i,h}| \leq (u_h + T)|C_i| \quad \forall i \in S, h \in H \] ### Summary This paper solves the problem of how to perform fair clustering in the case of uncertain group membership by introducing new error models and a robust fair clustering algorithm. This research not only provides strict theoretical guarantees but also shows superior performance in practice.

Robust Fair Clustering with Group Membership Uncertainty Sets

Fair Labeled Clustering

Doubly Constrained Fair Clustering

On the cost of essentially fair clusterings

Optimal Clustering under Uncertainty

A Gibbs Posterior Framework for Fair Clustering

Cluster-level Group Representativity Fairness in $k$-means Clustering

Multigroup Robustness

Fair Clustering: Critique, Caveats, and Future Directions

Fair-Capacitated Clustering

Representativity Fairness in Clustering

Dependent randomized rounding for clustering and partition systems with knapsack constraints

Adversarially robust clustering with optimality guarantees

FAL-CUR: Fair Active Learning using Uncertainty and Representativeness on Fair Clustering

Robust Optimal Graph Clustering

Clustering with Confidence: Finding Clusters with Statistical Guarantees

Fair Clustering via Hierarchical Fair-Dirichlet Process

Proportional Fairness in Non-Centroid Clustering

D S ] 2 0 Ju n 20 19 Coresets for Clustering with Fairness Constraints

The Fairness-Quality Trade-off in Clustering

Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity Insights