Abstract:Constrained clustering problems generalize classical clustering formulations, e.g., $k$-median, $k$-means, by imposing additional constraints on the feasibility of clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out $m$ points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained $k$-median or $k$-means problem to the corresponding outlier-free version with only $(1+\varepsilon)$-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to $f(k,m, \varepsilon)$ instances of the outlier-free version, where $f(k, m, \varepsilon) = \left( \frac{k+m}{\varepsilon}\right)^{O(m)}$. As specific applications, we get the following results: - First FPT (in the parameters $k$ and $m$) $(1+\varepsilon)$-approximation algorithm for the outlier version of capacitated $k$-median and $k$-means in Euclidean spaces with hard capacities. - First FPT (in the parameters $k$ and $m$) $(3+\varepsilon)$ and $(9+\varepsilon)$ approximation algorithms for the outlier version of capacitated $k$-median and $k$-means, respectively, in general metric spaces with hard capacities. - First FPT (in the parameters $k$ and $m$) $(2-\delta)$-approximation algorithm for the outlier version of the $k$-median problem under the Ulam metric. Our work generalizes the known results to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces.

Efficient Clustering with Limited Distance Information

Clustering Protein Sequences Given the Approximation Stability of the Min-Sum Objective Function

A Statistical Information-Based Clustering Approach in Distance Space

Distributed Information Theoretic Clustering

Hybrid k-Clustering: Blending k-Median and k-Center

ThetA -- fast and robust clustering via a distance parameter

Simple, Scalable and Effective Clustering via One-Dimensional Projections

Clustering with Distributed Data

Clustering Stable Instances of Euclidean k-means

Clustering of high-dimensional observations

Clustering What Matters in Constrained Settings

Approximation Algorithms for Clustering with Dynamic Points

Faster Parallel Exact Density Peaks Clustering

A distance-type-insensitive clustering approach.

$k$-Center Clustering in Distributed Models

Scalable Density-Based Distributed Clustering

Fast Algorithms for Distributed K-Clustering with Outliers.

Optimal Time Bounds for Approximate Clustering

Distributed Kernel K-Means for Large Scale Clustering

Analysis of Agglomerative Clustering

Sparse Embedded K-Means Clustering.