A New Index Structure Combines A Cluster Algorithm with Block Distance

Lifang Yang,Meng Di,Xianglin Huang,Fengfeng Duan
DOI: https://doi.org/10.1109/cisp.2015.7407935
2015-01-01
Abstract:Many index structures for high-dimensional data have become very complex and their complexity may not matched by the increase of performance. Block distance is widely used for similarity measurement, and it is very simple and efficient. The index structures that map the high-dimensional data to single dimension values are relative simple and efficient. But none of these index structures can directly support the use of block distance for similarity search. In this paper, a new simple and yet efficient index structure called CBlockB-Tree is proposed. Our approach uses the cluster algorithm to divide the high-dimensional space into several clusters. And then for each cluster, map the high-dimensional feature data in this cluster to single dimension key values by the block distance between each feature date and its cluster center, and use the compact B+-tree to manage these key values. This index structure can not only directly support the use of block distance for similarity search, but also can effectively support the use of Euclidean distance for similarity search. Moreover, experimental results show that our simple index structure outperforms the NB-Tree and the BlockB-Tree we proposed previously.
What problem does this paper attempt to address?