A High Dimensional Index Based on Relative Distance Hashing Method

LUO Jizhou,LI Jianzhong,ZHU Yuanyuan,GAO Hong
DOI: https://doi.org/10.3778/j.issn.1673-9418.2008.01.003
2008-01-01
Abstract:Many indices are designed to process near neighbors search as well as range query efficiently in high dimensional data space. It is proved that such indices can hardly do faster than linear scan for the sake of high dimensionality. This paper proposes a two level hierarchical index to support nearest neighbor search and range query simultaneously in high dimensional metric data space. A notable character of this index is that data clustering can be completed automatically in index constructing and this character results in its higher speed than linear scan on clustering datasets and almost the same speed as linear scan on other datasets. The top level is a binary tree consisting of well organized reference points. The bottom level is a series of dynamic hashing tables in which the dataset is hashed according to the distances from the data points to the reference points. The query processing procedure can be narrowed to scan a few buckets and use candidate answer to prune many buckets that are irrelevant to the query. Theoretical analysis and experiments show that our index structure performs well both in in-memory and secondary storage.
What problem does this paper attempt to address?