Scalable and space-efficient Robust Matroid Center algorithms

DOI: https://doi.org/10.1186/s40537-023-00717-4
2023-04-18
Journal Of Big Data
Abstract:Given a dataset V of points from some metric space, a popular robust formulation of the k -center clustering problem requires to select k points (centers) of V which minimize the maximum distance of any point of V from its closest center, excluding the z most distant points (outliers) from the computation of the maximum. In this paper, we focus on an important constrained variant of the robust k -center problem, namely, the Robust Matroid Center (RMC) problem, where the set of returned centers are constrained to be an independent set of a matroid of rank k built on V . Instantiating the problem with the partition matroid yields a formulation of the fair k -center problem, which has attracted the interest of the ML community in recent years. In this paper, we target accurate solutions of the RMC problem under general matroids, when confronted with large inputs. Specifically, we devise a coreset-based algorithm affording efficient sequential, distributed (MapReduce) and streaming implementations. For any fixed , the algorithm returns solutions featuring a -approximation ratio, which is a mere additive term away from the 3-approximations achievable by the best known polynomial-time sequential algorithms. Moreover, the algorithm obliviously adapts to the intrinsic complexity of the dataset, captured by its doubling dimension D . For wide ranges of , our MapReduce/streaming implementations require two rounds/one pass and substantially sublinear local/working memory. The theoretical results are complemented by an extensive set of experiments on real-world datasets, which provide clear evidence of the accuracy and efficiency of our algorithms and of their improved performance with respect to previous solutions.
What problem does this paper attempt to address?