Diversity maximization in doubling metrics

Alfonso Cevallos,Friedrich Eisenbrand,Sarah Morell
DOI: https://doi.org/10.48550/arXiv.1809.09521
2018-09-25
Abstract:Diversity maximization is an important geometric optimization problem with many applications in recommender systems, machine learning or search engines among others. A typical diversification problem is as follows: Given a finite metric space $(X,d)$ and a parameter $k \in \mathbb{N}$, find a subset of $k$ elements of $X$ that has maximum diversity. There are many functions that measure diversity. One of the most popular measures, called remote-clique, is the sum of the pairwise distances of the chosen elements. In this paper, we present novel results on three widely used diversity measures: Remote-clique, remote-star and remote-bipartition. Our main result are polynomial time approximation schemes for these three diversification problems under the assumption that the metric space is doubling. This setting has been discussed in the recent literature. The existence of such a PTAS however was left open. Our results also hold in the setting where the distances are raised to a fixed power $q\geq 1$, giving rise to more variants of diversity functions, similar in spirit to the variations of clustering problems depending on the power applied to the distances. Finally, we provide a proof of NP-hardness for remote-clique with squared distances in doubling metric spaces.
Discrete Mathematics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to provide polynomial - time approximation schemes (PTAS) for three commonly - used diversity maximization problems (Remote - clique, Remote - star, Remote - bipartition) in metric spaces with fixed doubling dimension. Specifically, the goals of the paper are: 1. **Propose PTAS**: For a given metric space \((X, d)\) and a parameter \(k\in\mathbb{N}\), find a subset \(T\subseteq X\) with \(k\) elements such that the diversity of \(T\) is maximized. The paper focuses on three specific diversity functions: Remote - clique, Remote - star, and Remote - bipartition, and proposes PTAS for these functions under fixed doubling dimension. 2. **Handle the generalized version**: The paper not only considers the standard \(q = 1\) case but also studies the case when the distance is raised to an arbitrary constant power \(q\geq1\). This extends the application range of the traditional diversity maximization problem, similar to the different treatment methods of distance powers in the clustering problem. 3. **Prove NP - hardness**: The paper also proves that the Remote - clique problem using the squared Euclidean distance is NP - hard under fixed doubling dimension. This result fills the gap in the existing literature because there was no NP - hardness proof for the Remote - clique problem under fixed dimension before. ### Main contributions 1. **Existence of PTAS**: The paper proves that in metric spaces with fixed doubling dimension, PTAS exist for the Remote - clique, Remote - star, and Remote - bipartition problems. This result is achieved through a simple but effective algorithm that utilizes two key techniques: - **Structural properties**: Each instance can be divided into a main cluster with a limited diameter and its complement, and the complement must be part of the optimal solution. - **Grid rounding**: Perform grid rounding on the above - mentioned main cluster and generalize the analysis to the \(q\)-th power of the distance. 2. **Fast PTAS**: For the standard \(q = 1\) Remote - clique problem, the paper further optimizes the algorithm so that its running - time complexity is \(O(n(k+\epsilon^{-D}))+(\epsilon^{-1}\log k)^{O(\epsilon^{-D})}\cdot k\). This improved algorithm has a linear - time complexity when \(k\) is a constant and is suitable for large - scale data sets. 3. **PTAS for the minimum bipartition problem**: The paper also provides a PTAS for the minimum bipartition problem (min - bisection) in metric spaces with fixed doubling dimension for any constant \(q\geq1\). 4. **NP - hardness proof**: The paper is the first to prove that the Remote - clique problem using the squared Euclidean distance is NP - hard under fixed doubling dimension. ### Related work - **Standard case**: For the standard \(q = 1\) and general metric spaces, Chandra and Halldórsson provided a detailed study of multiple diversities, including Remote - clique, Remote - star, and Remote - bipartition problems. They observed that all these problems are NP - hard and provided corresponding approximation algorithms. - **Special cases**: Ravi et al. provided effective exact algorithms for instances on the real line and approximation factors on the Euclidean plane. Fekete and Meijer provided PTAS for the fixed - dimension \(\ell_1\) distance and improved the approximation factors on the Euclidean plane. ### Conclusion The paper significantly advances the research on diversity maximization problems by proposing new PTAS and improving existing algorithms, especially in metric spaces with fixed doubling dimension. These results are not only theoretically significant but also provide more efficient solutions for practical applications.