Idisque: Tuning High-Dimensional Similarity Queries in DHT Networks

Xiaolong Zhang,Lidan Shou,Kian-Lee Tan,Gang Chen
DOI: https://doi.org/10.1007/978-3-642-12026-8_4
2010-01-01
Abstract:In this paper, we propose a fully decentralized framework called iDISQUE to support tunable approximate similarity query of high dimensional data in DHT networks. The iDISQUE framework utilizes a distributed indexing scheme to organize data summary structures called iDisques, which describe the cluster information of the data on each peer. The publishing process of iDisques employs a locality-preserving mapping scheme. Approximate similarity queries can be resolved using the distributed index. The accuracy of query results can be tuned both with the publishing and query costs. We employ a multi-probe technique to reduce the index size without compromising the effectiveness of queries. We also propose an effective load-balancing technique based on multi-probing. Experiments on real and synthetic datasets confirm the effectiveness and efficiency of iDISQUE.
What problem does this paper attempt to address?