Correlation-aware Partitioning for Skewed Range Query Optimization

Wei Ge,Xianxian Li,Chunfeng Yuan,Yihua Huang
DOI: https://doi.org/10.1007/s11280-018-0547-4
2018-01-01
World Wide Web
Abstract:Data partitioning is an effective way to reduce cost and improve query performance in large-scale Web data analytical applications. State-of-the-art partitioning approaches on range queries is lacking of considering the correlation of data in a certain access patterns, especially in some skewed patterns. This paper presents a correlation-aware partitioning model for skewed range queries. It formulates partitioning optimization issue on continuous correlated data as a geometrical step curve fitting problem. Then, we prove that the optimal partitioning should split data on range query boundaries. On this basis, Range Boundary Based DP Partitioning is designed to induce the optimal partition and significantly reduce the computation cost compared to the baseline dynamic programming algorithm. Local is better than global. For efficiency, Bottom-up Merging Partitioning is proposed further to improve partitioning by bottom-up merging instead of searching. To evaluate the proposed approaches, sets of experiments are conducted under skewed range query workloads on skewed and uniform datasets, and show they do optimize the efficiency of data partitioning by hundreds of times.
What problem does this paper attempt to address?