Automating localized learning for cardinality estimation based on XGBoost

Jieming Feng,Zhanhuai Li,Qun Chen,Hailong Liu
DOI: https://doi.org/10.1007/s10115-024-02142-2
IF: 2.7
2024-06-02
Knowledge and Information Systems
Abstract:For cardinality estimation in DBMS, building multiple local models instead of one global model can usually improve estimation accuracy as well as reducing the effort to label large amounts of training data. Unfortunately, the existing approach of localized learning requires users to explicitly specify which query patterns a local model can handle. Making these decisions is very arduous and error-prone for users; to make things worse, it limits the usability of local models. In this paper, we propose a localized learning solution for cardinality estimation based on XGBoost, which can automatically build an optimal combination of local models given a query workload. It consists of two phases: 1) model initialization; 2) model evolution. In the first phase, it clusters training data into a set of coarse-grained query pattern groups based on pattern similarity and constructs a separate local model for each group. In the second phase, it iteratively merges and splits clusters to identify an optimal combination by reconstructing local models. We formulate the problem of identifying the optimal combination of local models as a combinatorial optimization problem and present an efficient heuristic algorithm, named MMS ( M odels M erging and S plitting), for its solution due to its exponential complexity. Finally, we validate its performance superiority over the existing learning alternatives by extensive experiments on real datasets.
computer science, information systems, artificial intelligence
What problem does this paper attempt to address?