Eclipse: Practicability Beyond Knn and Skyline

Jinfei Liu,Li Xiong,Qiuchen Zhang,Jian Pei,Jun Luo
DOI: https://doi.org/10.48550/arxiv.1707.01223
2017-01-01
Abstract:The $k$ nearest neighbor ($k$NN) query is a fundamental problem in databases. Given a set of multidimensional data points and a query point, $k$NN returns the $k$ nearest neighbors based on a scoring function such as weighted sum given an attribute weight vector. However, the attribute weight vector can be difficult to specify in practice. Skyline returns the points including all possible nearest neighbors without requiring the exact attribute weight vector or a scoring function but the number of returned points can be prohibitively large for practical use. In this paper, we propose a novel \emph{eclipse} definition which provides a more flexible and customizable definition than the classic $1$NN and skyline. In eclipse, users can specify a range of attribute weights and control the number of returned points. We show that both $1$NN and skyline are instantiations of eclipse. To compute eclipse points, we propose a baseline algorithm with time complexity of $O(n^22^{d-1})$, and an improved $O(n\log ^{d-1}n)$ time transformation-based algorithm by transforming the eclipse problem to the skyline problem, where $n$ is the number of points and $d$ is the number of dimensions. Furthermore, we propose a novel index-based algorithm utilizing duality transform with much better efficiency. The experimental results on the real NBA dataset and the synthetic datasets demonstrate the effectiveness and efficiency of our eclipse algorithms.
What problem does this paper attempt to address?