Apara: Workload-Aware Data Partition and Replication for Parallel Databases.

Xiaolei Zhang,Chunxi Zhang,Yuming Li,Rong Zhang,Aoying Zhou
DOI: https://doi.org/10.1007/978-3-030-26075-0_15
2019-01-01
Abstract:Data partition and replication mechanisms directly determine query execution patterns in parallel database systems, which have a great impact on system performance. Recently, there have been some workload-aware data storage techniques, but they suffer from problems of narrow support to complex workloads or large requirements for storage. In order to enable the support for complex analytical workloads over massive distributed database systems, we design and implement a workload-aware data partition and replication tool, called Apara. We design two heuristic algorithms and define two cost models for effective data partition calculation and efficient replication usages. We run a set of experiments to compare and demonstrate the performance between Apara and the other representative work. The results show that Apara consistently outperforms the primary solutions on TPC-H workloads.
What problem does this paper attempt to address?