BC-iDistance:Bit-code Based Optimal High-dimensional Index

LIANG Jun-jie,FENG Yu-cai
DOI: https://doi.org/10.3969/j.issn.1000-1220.2007.09.021
2007-01-01
Abstract:In the recent literature, a variety of index structures have been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one dimensional transformation can efficiently break the curse of dimensionality. Based on the two techniques above, an optimal index is proposed, called Bit-Code based iDistance(BC-iDistance). To overcome the drawback of the iDistance that one-dimensional transformation can incur much information loss, the BC-iDistance utilizes a novel representation that compactly represent a d-dimension vector as a 2-dimension vector: The first component is a distance that reflects the similarity of the d-dimension data point with respect to the corresponding reference point and the second component is a bit code, with one bit per dimension, that indicates which side of the reference it lies. This representation enables two levels filtering: the first component prunes away points that do not share similar distance ranges, while the second component filters away points by the lower bound distance based on their bit codes. Moreover, the representation facilitates the use of a single index structure to further speed up processing. We employ the classical B+-tree for this purpose. The results of our experiments, using both synthetic and real data, demonstrate that the BC-iDistance outperforms the iDistance for KNN search in high-dimensional spaces.
What problem does this paper attempt to address?