ShadowAQP: Efficient Approximate Group-by and Join Query Via Attribute-oriented Sample Size Allocation and Data Generation.
Rong Gu,Han Li,Haipeng Dai,Wenjie Huang,Jie Xue,Meng Li,Jiaqi Zheng,Haoran Cai,Yihua Huang,Guihai Chen
DOI: https://doi.org/10.14778/3625054.3625059
IF: 2.5
2023-01-01
Proceedings of the VLDB Endowment
Abstract:Approximate query processing (AQP) is one of the key techniques to cope with big data querying problem on account that it obtains approximate answers efficiently. To address non-trivial sample selection and heavy sampling cost issues in AQP, we propose ShadowAQP, an efficient and accurate approach based on attribute-oriented sample size allocation and data generation. We select samples according to group-by and join attributes, and determine the sample size for each group of unique value combinations to improve query accuracy. We design a conditional variational autoencoder model with automatic table data encoding and model update strategies. To further improve accuracy and efficiency, we propose a set of extensions, including parallel multi-round sampling aggregation, data outlier-aware sampling, and dimension reduction optimization. Evaluation results on diversified datasets show that, compared with SOTA approaches, ShadowAQP achieves 5.8× query speed performance improvement on average (up to 12.8×), while reducing query error by 74% on average (up to 95%) at the same time.