Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal Coreset Bounds

Nikhil Bansal,Vincent Cohen-Addad,Milind Prabhu,David Saulpic,Chris Schwiegelshohn
2024-05-02
Abstract:Coresets are arguably the most popular compression paradigm for center-based clustering objectives such as $k$-means. Given a point set $P$, a coreset $\Omega$ is a small, weighted summary that preserves the cost of all candidate solutions $S$ up to a $(1\pm \varepsilon)$ factor. For $k$-means in $d$-dimensional Euclidean space the cost for solution $S$ is $\sum_{p\in P}\min_{s\in S}\|p-s\|^2$.
Data Structures and Algorithms
What problem does this paper attempt to address?