Optimizing Window Aggregate Functions Via Random Sampling
Guangxuan Song,Wenwen Qu,Yilin Wang,Xiaoling Wang
DOI: https://doi.org/10.1007/978-3-319-63564-4_19
2017-01-01
Abstract:Window functions have been a part of the SQL standard since 2003 and have been well studied during the past decade. As the demand increases in analytics tools, window functions have seen an increasing amount of potential applications. Although the current mainstream commercial databases support window functions, the existing implementation strategies are inefficient for the real-time processing of big data. Recently, some algorithms based on sampling (e.g., online aggregation) have been proposed to deal with large and complex data in relational databases, which offer us a flexible tradeoff between accuracy and efficiency. However, sampling techniques have not been considered for window functions in databases. In this paper, we first propose two algorithms to deal with window functions based on two sampling techniques, Naive Random Sampling and Incremental Random Sampling. The proposed algorithms are highly efficient and are general enough to aggregate other existing algorithms of window functions. In particular, we evaluated our algorithms in the latest version of PostgreSQL, which demonstrated superior performance over the TPC-H benchmark.