Proportional Fairness in Non-Centroid Clustering

Ioannis Caragiannis,Evi Micha,Nisarg Shah
2024-10-31
Abstract:We revisit the recently developed framework of proportionally fair clustering, where the goal is to provide group fairness guarantees that become stronger for groups of data points (agents) that are large and cohesive. Prior work applies this framework to centroid clustering, where the loss of an agent is its distance to the centroid assigned to its cluster. We expand the framework to non-centroid clustering, where the loss of an agent is a function of the other agents in its cluster, by adapting two proportional fairness criteria -- the core and its relaxation, fully justified representation (FJR) -- to this setting. We show that the core can be approximated only under structured loss functions, and even then, the best approximation we are able to establish, using an adaptation of the GreedyCapture algorithm developed for centroid clustering [Chen et al., 2019; Micha and Shah, 2020], is unappealing for a natural loss function. In contrast, we design a new (inefficient) algorithm, GreedyCohesiveClustering, which achieves the relaxation FJR exactly under arbitrary loss functions, and show that the efficient GreedyCapture algorithm achieves a constant approximation of FJR. We also design an efficient auditing algorithm, which estimates the FJR approximation of any given clustering solution up to a constant factor. Our experiments on real data suggest that traditional clustering algorithms are highly unfair, whereas GreedyCapture is considerably fairer and incurs only a modest loss in common clustering objectives.
Machine Learning,Artificial Intelligence,Computer Science and Game Theory
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of proportional fairness in non - centroid clustering. Specifically, it attempts to extend the framework of proportional fairness so that it is applicable to non - centroid clustering, and studies how to achieve fair clustering results in this setting. #### Background and Motivation 1. **Limitations of Traditional Clustering Methods**: - In traditional centroid clustering, the loss of each data point (or agent) is defined as the distance to the centroid of the cluster to which it belongs. This method is reasonable in some application scenarios, such as the facility location problem. - However, in other application scenarios, we do not need to define the centroid, but directly group the data points. For example, in federated learning, document clustering, medical image segmentation, and social network segmentation scenarios, the inter - relationships among data points are more important rather than the location of the centroid. 2. **Importance of Fairness**: - In these application scenarios, it is very important to ensure that the data points within each cluster are as similar as possible. To achieve this, fairness guarantees need to be introduced, so that any sufficiently large and internally cohesive set of data points will not benefit from forming its own cluster. #### Main Research Questions 1. **Can Convincing Proportional Fairness Guarantees be Provided for Non - Centroid Clustering?** - The paper explores whether proportional fairness guarantees similar to those in centroid clustering can be achieved in non - centroid clustering. 2. **Are Existing Centroid Clustering Algorithms Applicable to Non - Centroid Clustering?** - It studies whether existing centroid clustering algorithms (such as GreedyCapture) can be directly applied to non - centroid clustering and evaluates their effects. 3. **How to Audit the Proportional Fairness of a Given Algorithm?** - It proposes an effective method to estimate the approximate degree of proportional fairness of any given clustering solution. #### Specific Contributions - **Proposed Two Proportional Fairness Guarantees**: Core and its relaxed form - Fully Justified Representation (FJR). It is proved that FJR can be satisfied under any loss function, while the core can only obtain an approximate solution under some structured loss functions. - **Designed New Algorithms**: Proposed a new (but inefficient) algorithm, GreedyCohesiveClustering, which can accurately implement FJR; and improved the GreedyCapture algorithm to achieve a constant approximation of FJR in polynomial time. - **Experimental Verification**: Through experiments on real - data sets, it is proved that GreedyCapture provides significantly better FJR and core approximations in terms of average and maximum losses, while only having a slight impact on the traditional clustering objective function. #### Conclusion This paper conducts in - depth research on proportional fairness in non - centroid clustering, and proposes a series of theoretical and algorithmic achievements, providing new directions and tools for future research.