Efficient K -Closest Pair Queries in General Metric Spaces

Yunjun Gao,Lu Chen,Xinhan Li,Bin Yao,Gang Chen
DOI: https://doi.org/10.1007/s00778-015-0383-4
2015-01-01
Abstract:Given two object sets \(P\) and \(Q\), a k-closest pair \((k\hbox {CP})\) query finds \(k\) closest object pairs from \(P\times Q\). This operation is common in many real-life applications such as GIS, data mining, and recommender systems. Although it has received much attention in the Euclidean space, there is little prior work on the metric space. In this paper, we study the problem of kCP query processing in general metric spaces, namely Metric kCP \((\hbox {M}k\hbox {CP})\) search, and propose several efficient algorithms using dynamic disk-based metric indexes (e.g., M-tree), which can be applied to arbitrary type of data as long as a certain metric distance is defined and satisfies the triangle inequality. Our approaches follow depth-first and/or best-first traversal paradigm(s), employ effective pruning rules based on metric space properties and the counting information preserved in the metric index, take advantage of aggressive pruning and compensation to further boost query efficiency, and derive a node-based cost model for \(\hbox {M}k\hbox {CP}\) retrieval. In addition, we extend our techniques to tackle two interesting variants of \(\hbox {M}k\hbox {CP}\) queries. Extensive experiments with both real and synthetic data sets demonstrate the performance of our proposed algorithms, the effectiveness of our developed pruning rules, and the accuracy of our presented cost model.
What problem does this paper attempt to address?