Precision Meets Resilience: Cross-Database Generalization with Uncertainty Quantification for Robust Cost Estimation

Shuhuan Fan,Mengshu Hou,Rui Xi,Wenwen Ma
DOI: https://doi.org/10.1145/3627673.3679632
2024-01-01
Abstract:Learning-based models have shown promise in addressing query optimization challenges in the database field, where the learned cost model plays a central role. While these models outperform traditional optimizers on static datasets, their resilience and reliability in real-world applications remain a concern, limiting their widespread adoption. In this paper, we take a step towards a practical cost estimation model, named Tosure, which can quantify the uncerT ainty for cost estimation and generalizes to unseen databases accurately and efficiently. It consists primarily of two modules: a Cross-Database Representation (CDR) module and a Cost Estimation with Uncertainty (CEU) module. The CDR module captures the transferable features by focusing the minimal set based on deep-learning network, thereby enhancing the model's generalization capabilities. The CEU module introduces a novel Neural Network Gaussian Process (NNGP) to quantify the uncertainty in cost estimation, ensuring more robust estimations with an upper bound. To improve the model's performance, we perform pre-training on diverse large-scale datasets. Furthermore, we implement the model and integrate it with traditional query optimizer to validate its usability and effectiveness in real-world scenarios. Extensive experimentation demonstrates that Tosure outperforms state-of-the-art methods, achieving a 20% improvement in cost estimation accuracy and twice of the robustness.
What problem does this paper attempt to address?