Multi-Dimensional Index Over A Key-Value Store For Semi-Structured Data

Xin Gao,Yong Qi,Di Hou
DOI: https://doi.org/10.1007/978-3-030-28061-1_18
2019-01-01
Abstract:The informal data structures and trillions of data volume are the challenges for databases to store and retrieve semi-structured data. Most researchers deal with the issues through R-Tree, KD-tree and space curves, but these structures are not suitable for default and discrete values of semi-structured data, and even require sampling before storage. We present MD-Index, a scalable multi-dimensional indexing system that supports high-throughput and real-time range queries. MD-Index builds bitmap index of sliced data over a range partitioned Key-value store. The underlying Key-value store guarantees high throughput, large data storage, high availability and fault tolerance of the system, and bitmap provides multi-dimensional index of data. Meanwhile, MD-Index encodes the discrete values as the hash code of a slice, and stores the data and the bitmap of a slice in the same region (a storage unit of the range partitioned Key-value store) to utilize distributed computing and data locality. Our prototype of MD-Index is built on HBase, the standard Key-value database. Experimental results reveal that MD-Index is capable of storing and retrieving trillions of semi-structured data and achieving a throughput of two million records per second.
What problem does this paper attempt to address?