Efficient Approximate Algorithms for the Closest Pair Problem in High Dimensional Spaces.

Xingyu Cai,Sanguthevar Rajasekaran,Fan Zhang
DOI: https://doi.org/10.1007/978-3-319-93040-4_13
2018-01-01
Abstract:The Closest Pair Problem (CPP) is one of the fundamental problems that has a wide range of applications in data mining, such as unsupervised data clustering, user pattern similarity search, etc. A number of exact and approximate algorithms have been proposed to solve it in the low dimensional space. In this paper, we address the problem when the metric space is of a high dimension. For example, the drug-target or movie-user interaction data could contain as many as hundreds of features. To solve this problem under the (ell _2) norm, we present two novel approximate algorithms. Our algorithms are based on the novel idea of projecting the points into the real line. We prove high probability bounds on the run time and accuracy for both of the proposed algorithms. Both algorithms are evaluated via comprehensive experiments and compared with existing best-known approaches. The experiments reveal that our proposed approaches outperform the existing methods.
What problem does this paper attempt to address?