Thorough Data Pruning for Join Query in Database System
Jintao Gao,Zhanhuai Li,Jian Sun
DOI: https://doi.org/10.1109/tsusc.2023.3279382
2023-01-01
IEEE Transactions on Sustainable Computing
Abstract:The improvement of robustness and efficiency for multi-way equijoin query is challenging, no-matter for centralized database systems or distributed database systems. Due to lots of unnecessary data existing during query processing, these two metrics will be seriously reduced. If we can thoroughly prune unnecessary data in advance, the robustness and efficiency will be highly improved. However, the pruning power of current strategies, such as predicate push-down and algebraic equivalence, is limited. We present deepDP, a powerful, generalized, and efficient strategy for data pruning. deepDP builds multiple independent pruning spaces by generating longest transitive closures and applies appropriate data pruning strategy for each pruning space. For thoroughly pruning unnecessary data, deepDP employs $\alpha \cdot \beta$ pruning strategy to clean each pruning space based on a newly designed statistic information-Hollow Range and re-shuffles the elements in all pruned spaces for maximizing robustness and efficiency benefits meanwhile minimizing the invasion. We implement deepDP in PostgreSQL but are not limited to it, and evaluate deepDP on TPC-H, JOB, and our synthesis benchmark–DHR. The experiment results show that compared to traditional data pruning strategy, deepDP can improve multi-way equijoin query on efficiency by 3.5x.