Abstract:We revisit the recently developed framework of proportionally fair clustering, where the goal is to provide group fairness guarantees that become stronger for groups of data points (agents) that are large and cohesive. Prior work applies this framework to centroid clustering, where the loss of an agent is its distance to the centroid assigned to its cluster. We expand the framework to non-centroid clustering, where the loss of an agent is a function of the other agents in its cluster, by adapting two proportional fairness criteria -- the core and its relaxation, fully justified representation (FJR) -- to this setting. We show that the core can be approximated only under structured loss functions, and even then, the best approximation we are able to establish, using an adaptation of the GreedyCapture algorithm developed for centroid clustering [Chen et al., 2019; Micha and Shah, 2020], is unappealing for a natural loss function. In contrast, we design a new (inefficient) algorithm, GreedyCohesiveClustering, which achieves the relaxation FJR exactly under arbitrary loss functions, and show that the efficient GreedyCapture algorithm achieves a constant approximation of FJR. We also design an efficient auditing algorithm, which estimates the FJR approximation of any given clustering solution up to a constant factor. Our experiments on real data suggest that traditional clustering algorithms are highly unfair, whereas GreedyCapture is considerably fairer and incurs only a modest loss in common clustering objectives.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of proportional fairness in non - centroid clustering. Specifically, it attempts to extend the framework of proportional fairness so that it is applicable to non - centroid clustering, and studies how to achieve fair clustering results in this setting. #### Background and Motivation 1. **Limitations of Traditional Clustering Methods**: - In traditional centroid clustering, the loss of each data point (or agent) is defined as the distance to the centroid of the cluster to which it belongs. This method is reasonable in some application scenarios, such as the facility location problem. - However, in other application scenarios, we do not need to define the centroid, but directly group the data points. For example, in federated learning, document clustering, medical image segmentation, and social network segmentation scenarios, the inter - relationships among data points are more important rather than the location of the centroid. 2. **Importance of Fairness**: - In these application scenarios, it is very important to ensure that the data points within each cluster are as similar as possible. To achieve this, fairness guarantees need to be introduced, so that any sufficiently large and internally cohesive set of data points will not benefit from forming its own cluster. #### Main Research Questions 1. **Can Convincing Proportional Fairness Guarantees be Provided for Non - Centroid Clustering?** - The paper explores whether proportional fairness guarantees similar to those in centroid clustering can be achieved in non - centroid clustering. 2. **Are Existing Centroid Clustering Algorithms Applicable to Non - Centroid Clustering?** - It studies whether existing centroid clustering algorithms (such as GreedyCapture) can be directly applied to non - centroid clustering and evaluates their effects. 3. **How to Audit the Proportional Fairness of a Given Algorithm?** - It proposes an effective method to estimate the approximate degree of proportional fairness of any given clustering solution. #### Specific Contributions - **Proposed Two Proportional Fairness Guarantees**: Core and its relaxed form - Fully Justified Representation (FJR). It is proved that FJR can be satisfied under any loss function, while the core can only obtain an approximate solution under some structured loss functions. - **Designed New Algorithms**: Proposed a new (but inefficient) algorithm, GreedyCohesiveClustering, which can accurately implement FJR; and improved the GreedyCapture algorithm to achieve a constant approximation of FJR in polynomial time. - **Experimental Verification**: Through experiments on real - data sets, it is proved that GreedyCapture provides significantly better FJR and core approximations in terms of average and maximum losses, while only having a slight impact on the traditional clustering objective function. #### Conclusion This paper conducts in - depth research on proportional fairness in non - centroid clustering, and proposes a series of theoretical and algorithmic achievements, providing new directions and tools for future research.

Proportional Fairness in Non-Centroid Clustering

Proportionally Representative Clustering

Proportional Fairness in Clustering: A Social Choice Perspective

Fair Labeled Clustering

On the cost of essentially fair clusterings

Cluster-level Group Representativity Fairness in $k$-means Clustering

Doubly Constrained Fair Clustering

A Gibbs Posterior Framework for Fair Clustering

Coresets for Clustering with Fairness Constraints.

D S ] 2 0 Ju n 20 19 Coresets for Clustering with Fairness Constraints

Fair Clustering: A Causal Perspective

Fair Clustering: Critique, Caveats, and Future Directions

Proportional Fairness in Federated Learning

The Fairness-Quality Trade-off in Clustering

Representativity Fairness in Clustering

Fair-Capacitated Clustering

Fair Clustering via Hierarchical Fair-Dirichlet Process

Fair Clustering Using Antidote Data

Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity Insights

Generalized Reductions: Making any Hierarchical Clustering Fair and Balanced with Low Cost

Fair Minimum Representation Clustering via Integer Programming