HCIndex: a Hilbert-Curve-based Clustering Index for Efficient Multi-Dimensional Queries for Cloud Storage Systems
Xinyang Wang,Yu Sun,Qiao Sun,Weiwei Lin,James Z. Wang,Wei Li
DOI: https://doi.org/10.1007/s10586-022-03723-y
2022-01-01
Cluster Computing
Abstract:With the rapid development of the Internet of Things and cloud computing, HBase has become a good choice for massive data storage, and is efficient in reading and writing data. However, HBase is not supportive for multi-dimensional query of non-rowkey data, unconducive to data analysis and processing. To address this issue, we first analyze the constitution principle and deficiency of secondary index and clustering index, and select clustering index as the basis of optimization. Then, we choose the Hilbert curve in the space filling curve as the linearization technology, design the pre-partition algorithm and subspace partition algorithm, and realize the Hilbert-curve-based clustering index (HCIndex) which supports multi-dimensional point query and range query. Finally, the performance of HCIndex is verified by comparison experiments with HBase Scan, HiBase and CCIndex. The experimental results show that the query efficiency of HCIndex has been greatly improved at the expense of very limited storage space, which is necessary for storing index data and only 1.7 times the size of the original data table of HBase. Compared with HBase scan, the query efficiency of HCIndex’s multi-dimensional point query and range query has been increased to more than 4 times and more than 2 times, respectively. Therefore, the proposed HCIndex is well suited for efficient multi-dimensional and complex queries of massive data in cloud storage systems.