Streaming Fair K-Center Clustering over Massive Dataset with Performance Guarantee

Zeyu Lin,Longkun Guo,Chaoqi Jia
DOI: https://doi.org/10.1007/978-981-97-2259-4_8
2024-01-01
Abstract:Emerging applications are imposing challenges for incorporating fairness constraints into k-center clustering in the streaming setting. Different from the traditional k-center problem, the fairness constraints require that the input points be divided into disjoint groups and the number of centers from each group is constrained by a given upper bound. Moreover, observing the applications of fair k-center inmassive datasets, we consider the problem in the streaming setting, where the data points arrive in a streaming manner that each point can be processed at its arrival. As themain contributions, we propose a two-pass streaming algorithm for the fair k-center problem with two groups, achieving an approximation ratio of 3 + epsilon and consuming only O(k log n) memory and O(k) update time, matching the state-of-art ratio for the offline setting. Then, we show that the algorithm can be easily improved to a one-pass streaming algorithm with an approximation ratio of 7+ epsilon and the same memory complexity and update time. Moreover, we show that our algorithm can be simply tuned to solve the case with an arbitrary number of groups while achieving the same ratio and space complexity. Lastly, we carried out extensive experiments to evaluate the practical performance of our algorithm compared with the state-of-the-art algorithms.
What problem does this paper attempt to address?