Robust Fair Clustering with Group Membership Uncertainty Sets

Sharmila Duppala,Juan Luque,John P. Dickerson,Seyed A. Esmaeili
2024-06-02
Abstract:We study the canonical fair clustering problem where each cluster is constrained to have close to population-level representation of each group. Despite significant attention, the salient issue of having incomplete knowledge about the group membership of each point has been superficially addressed. In this paper, we consider a setting where errors exist in the assigned group memberships. We introduce a simple and interpretable family of error models that require a small number of parameters to be given by the decision maker. We then present an algorithm for fair clustering with provable robustness guarantees. Our framework enables the decision maker to trade off between the robustness and the clustering quality. Unlike previous work, our algorithms are backed by worst-case theoretical guarantees. Finally, we empirically verify the performance of our algorithm on real world datasets and show its superior performance over existing baselines.
Machine Learning,Artificial Intelligence,Data Structures and Algorithms
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to handle the uncertainty of group membership in fair clustering. Specifically, when the group membership information of each data point is incorrect or incomplete, how to ensure that the clustering results remain fair. ### Problem Background In the traditional fair clustering problem, it is required that the proportion of each group in each cluster is close to the proportion of these groups in the entire dataset. However, in practical applications, the information of group membership may be incomplete, noisy, or even maliciously tampered with. For example, in the advertising placement scenario, group membership may be estimated by a machine - learning model; in the loan approval scenario, the estimation of group membership may be illegal or infeasible. Therefore, how to perform fair clustering in such uncertain situations is an important research problem. ### Main Contributions of the Paper 1. **Introducing New Error Models**: The paper proposes three error models - Bounded Aggregation Error (BAE), Bounded Pairwise Error (BPE), and Bounded Aggregation and Pairwise Error (BAPE). These models allow decision - makers to specify a small number of parameters based on the available information, rather than providing complete probability information for each point. In particular, the BAPE model combines the advantages of aggregation error and pairwise error and provides higher flexibility. 2. **Robust Fair Clustering Algorithm**: Based on these error models, the paper proposes a robust fair clustering algorithm that can ensure the fairness of clustering results in the presence of errors in group membership. This algorithm has a theoretical worst - case guarantee and can be verified in practice to have better performance than existing methods. 3. **Balancing Robustness and Clustering Quality**: The paper introduces a tolerance parameter \(T\), which enables decision - makers to balance between robustness and clustering quality. By adjusting \(T\), appropriate solutions can be flexibly selected in different application scenarios. ### Summary of Mathematical Formulas - **Fairness Constraints**: \[ l_h |C_i| \leq |C_{i,h}| \leq u_h |C_i| \quad \forall i \in S, h \in H \] where \(l_h\) and \(u_h\) are the lower and upper limit proportions of group \(h\) respectively, \(C_i\) is the \(i\)-th cluster, and \(C_{i,h}\) is the set of points belonging to group \(h\) in the \(i\)-th cluster. - **Maximum Fairness Violation**: \[ \Delta(S, M, m, \phi)=\max_{i \in S, h \in H}\left\{\frac{|C_{i,h}|+m \to h - u_h |C_i|}{|C_i|}, \frac{l_h |C_i|-(|C_{i,h}|-m_h \to)}{|C_i|}\right\} \] - **Fairness Constraints under Tolerance Parameter**: \[ (l_h - T)|C_i| \leq |\hat{C}_{i,h}| \leq (u_h + T)|C_i| \quad \forall i \in S, h \in H \] ### Summary This paper solves the problem of how to perform fair clustering in the case of uncertain group membership by introducing new error models and a robust fair clustering algorithm. This research not only provides strict theoretical guarantees but also shows superior performance in practice.