Abstract:We present FasCo, a simple yet effective learning-based estimator for the cost of executing a database query plan. FasCo uses significantly shorter training time and a lower inference cost than the state-of-the-art approaches, while achieving higher estimation accuracy. The effectiveness of FasCo comes from embedding abundant explicit execution-plan-based features and incorporating a novel technique called cardinality calibration. Extensive experimental results show that FasCo achieves orders of magnitude higher efficiency than the state-of-the-art methods: on the JOB-M benchmark dataset, it cuts off training plans by 98\%, reducing training time from more than two days to about eight minutes while entailing better accuracy. Furthermore, in dynamic environments, FasCo can maintain satisfactory accuracy even without retraining, narrowing the gap between learning-based estimators and real systems.

What problem does this paper attempt to address?

The paper attempts to address the issue in database systems where existing machine learning (ML) cost estimation models, while improving estimation accuracy, often come with high training and inference costs. Specifically, these models require significant training time and computational resources when handling large-scale datasets, which limits their application in dynamic environments, especially in real-world database systems where data is frequently updated. Additionally, many existing cost estimation models rely on complex cardinality estimation processes, further increasing the cost. To address these issues, the paper proposes a lightweight cost estimation model called **FasCo**. The design goal of FasCo is to achieve efficient and accurate cost estimation by reducing model size, lowering training overhead, and speeding up inference, while maintaining or improving estimation accuracy. The main contributions of FasCo include: 1. **Lightweight Model Design**: FasCo adopts a simple multi-layer perceptron (MLP) network structure, simplifying the execution plan tree structure by merging single nodes and their child nodes, thereby significantly reducing training and inference costs. 2. **Explicit Feature Selection**: FasCo integrates rich explicit features such as operators, subqueries, cardinality, etc., which help improve the model's accuracy. 3. **Cardinality Calibration Technique**: FasCo introduces a sampling-based cardinality calibration method to enhance the accuracy of cardinality estimation, thereby further improving the precision of cost estimation. Through these innovations, experimental results on multiple benchmark datasets show that FasCo not only far exceeds existing methods in training and inference efficiency but also performs excellently in terms of accuracy, especially in dynamic database environments.

Less is More: Towards Lightweight Cost Estimator for Database Systems

QCFE: An efficient Feature engineering for query cost estimation

An End-to-End Learning-based Cost Estimator

COOOL: A Learning-To-Rank Approach for SQL Hint Recommendations

Learning-Based Cooperative False Data Injection Attack and Its Mitigation Techniques in Consensus-Based Distributed Estimation

Cost-sensitive Regression Learning on Small Dataset Through Intra-Cluster Product Favoured Feature Selection

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Rethinking Learned Cost Models: Why Start from Scratch?

Learned Cardinality Estimation: A Design Space Exploration and A Comparative Evaluation

Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation.

Cardinality Estimation in DBMS

Adaptive Cardinality Estimation

Forecasting SQL Query Cost at Twitter

FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation

Database Query Cost Prediction Using Recurrent Neural Network

Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection

PRICE: A Pretrained Model for Cross-Database Cardinality Estimation

Flow-Loss: Learning Cardinality Estimates That Matter

Cost-Based or Learning-Based?

A Learning-Based Approach to Estimate Statistics of Operators in Continuous Queries: a Case Study.

One stone, two birds: A lightweight multidimensional learned index with cardinality support