Efficient and Effective Cardinality Estimation for Skyline Family.

Xiaoye Miao,Yangyang Wu,Jiazhen Peng,Yunjun Gao,Jianwei Yin
DOI: https://doi.org/10.1145/3588958
2023-01-01
Abstract:Cardinality estimation, predicting the query result size, is a fundamental problem in databases. Existing skyline cardinality estimation methods are computationally infeasible for massive skyline queries over the large-scale database. In this paper, we introduce a unified skyline family w.r.t. various skyline variants. We propose an efficient and effective skyline family cardinality estimation model, named EECE, in an end-to-end manner. EECE consists of two modules, unsupervised data distribution learning (DDL) and supervised monotonic cardinality estimation (MCE). DDL leverages the mixture data guided transformer to learn the distribution of database and query parameters for model pre-training. MCE further incorporates supervised learning and parameter clamping to enhance the estimation under monotonicity guarantees. We develop an efficient incremental learning algorithm for EECE to adapt the database and query logs update. Extensive experiments on several real-world and synthetic datasets demonstrate that, EECE speeds up the cardinality estimation by six orders of magnitude, with more than 39% accuracy gain, compared to the state-of-the-art approaches.
What problem does this paper attempt to address?