SAQP++: Bridging the Gap Between Sampling-Based Approximate Query Processing and Aggregate Precomputation.

Dongxiang Zhang,Mingtao Lei,Xiang Zhu
DOI: https://doi.org/10.1109/dsc.2018.00044
2018-01-01
Abstract:In booming Big Data era, interactive data analytics is becoming a regular demand for data scientists. With the continuous growing of the data, timely response for aggregation queries is becoming increasingly challenging. To address this challenge, scientists have proposed two separate methods: sampling-based approximate query processing (SAQP) and aggregate precomputation (Materialization) such as data cubes. In this paper, we propose a novel framework: SAQP++, which combines sampling and precomputed aggregate together to reach the goal of both relative accurate results and acceptable preparation time. Using SUM aggregate function as the example, we propose an optimal solution of materializing under uniform distribution, and a hill climbing based algorithm of materializing under non-uniform distribution, respectively. Our experiments show that SAQP++ achieves a more flexible and better trade-off among preprocessing cost, query response time, and answer quality than SAQP or Materialization alternatives.
What problem does this paper attempt to address?