An Encoding-Based Dual Distance Tree High-Dimensional Index

Yi Zhuang,YueTing Zhuang,Fei Wu
DOI: https://doi.org/10.1007/s11432-008-0104-3
2008-01-01
Abstract:The paper proposes a novel symmetrical encoding-based index structure, which is called EDD-tree (for encoding-based dual distance tree), to support fast k -nearest neighbor ( k -NN) search in high-dimensional spaces. In the EDD-tree, all data points are first grouped into clusters by a k -means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme, in which each cluster sphere is partitioned twice according to the dual distances of start-and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B + -tree. Thus, given a query point, its k -NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EDD-tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high-dimensional search techniques such as the X-tree, VA-file, iDistance and NB-tree, especially when the query radius is not very large.
What problem does this paper attempt to address?