Less is More: Towards Lightweight Cost Estimator for Database Systems

Weiping Yu,Siqiang Luo
2023-03-22
Abstract:We present FasCo, a simple yet effective learning-based estimator for the cost of executing a database query plan. FasCo uses significantly shorter training time and a lower inference cost than the state-of-the-art approaches, while achieving higher estimation accuracy. The effectiveness of FasCo comes from embedding abundant explicit execution-plan-based features and incorporating a novel technique called cardinality calibration. Extensive experimental results show that FasCo achieves orders of magnitude higher efficiency than the state-of-the-art methods: on the JOB-M benchmark dataset, it cuts off training plans by 98\%, reducing training time from more than two days to about eight minutes while entailing better accuracy. Furthermore, in dynamic environments, FasCo can maintain satisfactory accuracy even without retraining, narrowing the gap between learning-based estimators and real systems.
Databases
What problem does this paper attempt to address?
The paper attempts to address the issue in database systems where existing machine learning (ML) cost estimation models, while improving estimation accuracy, often come with high training and inference costs. Specifically, these models require significant training time and computational resources when handling large-scale datasets, which limits their application in dynamic environments, especially in real-world database systems where data is frequently updated. Additionally, many existing cost estimation models rely on complex cardinality estimation processes, further increasing the cost. To address these issues, the paper proposes a lightweight cost estimation model called **FasCo**. The design goal of FasCo is to achieve efficient and accurate cost estimation by reducing model size, lowering training overhead, and speeding up inference, while maintaining or improving estimation accuracy. The main contributions of FasCo include: 1. **Lightweight Model Design**: FasCo adopts a simple multi-layer perceptron (MLP) network structure, simplifying the execution plan tree structure by merging single nodes and their child nodes, thereby significantly reducing training and inference costs. 2. **Explicit Feature Selection**: FasCo integrates rich explicit features such as operators, subqueries, cardinality, etc., which help improve the model's accuracy. 3. **Cardinality Calibration Technique**: FasCo introduces a sampling-based cardinality calibration method to enhance the accuracy of cardinality estimation, thereby further improving the precision of cost estimation. Through these innovations, experimental results on multiple benchmark datasets show that FasCo not only far exceeds existing methods in training and inference efficiency but also performs excellently in terms of accuracy, especially in dynamic database environments.