Table Embedding Models Based on Contrastive Learning for Improved Cardinality Estimation

Hongwei Li,Chaokun Wang,Ziyang Liu
DOI: https://doi.org/10.1007/978-981-97-7238-4_31
2024-01-01
Abstract:Cardinality estimation is crucial in enhancing the performance of query optimizers within database management systems. It helps to optimize query plans by predicting the size of intermediate query results and the final output, allowing for more efficient execution strategies. Many cardinality estimation methods commonly assume independence between tables, and they predict the size of result sets by statistically analyzing the data stored in databases. However, this assumption may lead to inaccurate estimates, as inter-column correlations often exist in real-world datasets. This paper introduces an innovative table data partitioning method aimed at enhancing the accuracy of cardinality estimation without unduly affecting estimation efficiency. We model the embedding of row data within tables and employ contrastive learning techniques to partition each table into several sub-tables, ensuring minimal inter-column correlation within each sub-table. This strategy allows for a more accurate reflection of true column correlations for cardinality estimation methods that rely on independence assumptions, thereby improving overall accuracy. The experimental results on IMDb and TPC-DS datasets demonstrate significant improvements in cardinality estimation accuracy across various estimation methods achieved by our proposed approach.
What problem does this paper attempt to address?