Abstract:We propose a novel model for learned query optimization which provides query hints leading to better execution plans. The model addresses the three key challenges in learned hint-based query optimization: reliable hint recommendation (ensuring non-degradation of query latency), efficient hint exploration, and fast inference. We provide an in-depth analysis of existing NN-based approaches to hint-based optimization and experimentally confirm the named challenges for them. Our alternative solution consists of a new inference schema based on an ensemble of context-aware models and a graph storage for reliable hint suggestion and fast inference, and a budget-controlled training procedure with a local search algorithm that solves the issue of exponential search space exploration. In experiments on standard benchmarks, our model demonstrates optimization capability close to the best achievable with coarse-grained hints. Controlling the degree of parallelism (query dop) in addition to operator-related hints enables our model to achieve 3x latency improvement on JOB benchmark which sets a new standard for optimization. Our model is interpretable and easy to debug, which is particularly important for deployment in production.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges existing in the current learning - based query optimization methods in providing reliable and efficient query hints. Specifically, the article mainly focuses on the following three key issues:
1. **Reliable Hint Recommendation**: Ensure that the recommended query hints will not lead to the deterioration of query latency.
2. **Efficient Hint Exploration**: Reduce the search space of hint combinations, thereby speeding up training and inference time.
3. **Fast Inference**: In practical applications, ensure that the model can quickly give optimization suggestions.
To solve these problems, the author proposes a new model named HERO (Hint - based Efficient and Reliable Query Optimizer). HERO improves the existing learning - based query optimization methods in the following aspects:
- **Reliable Hint Recommendation**: HERO adopts a new inference scheme, based on the integration of context - aware models and graph storage structures, ensuring the reliability of hint recommendations. At the same time, it uses a budget - controlled training process, combined with a local search algorithm, to solve the problem of exponential - level search space.
- **Efficient Hint Exploration**: HERO introduces a parallelized local search algorithm, which can be adaptively adjusted under different time budgets, improving the exploration efficiency. In addition, by parameterizing the local search process, it balances the relationship between performance improvement and search depth.
- **Fast Inference**: HERO designs a lightweight and reliable alternative, avoiding the black - box problem of neural network models, making the model more transparent, easier to debug, and able to quickly give optimization suggestions.
### Formula Representation
To understand the working principle of HERO more clearly, we can use some formulas to represent its core ideas. Suppose the execution time of query \(q\) is \(t(q,\theta,S)\), where \(\theta\) is the hint set and \(S\) is the statistical information. The goal of HERO is to minimize the query latency while ensuring the reliability of prediction:
\[
\min_{M} E_{q_i\sim P_Q}[t_M + t_i|\theta = M(\text{info})]
\]
where \(t_M\) is the model execution time and the time for calculating information, \(t_i\) is the execution time of query \(q_i\), and \(\theta = M(\text{info})\) represents the hint set predicted according to the input information. To ensure reliability, HERO also requires that for all queries \(q_i\in\text{supp}(P_Q)\):
\[
t_M + t_i|\theta = M(\text{info})\leq t_i|\theta_{\text{default}}
\]
### Summary
HERO effectively solves the problems existing in the current learning - based query optimization methods by introducing new inference schemes, context - aware model integrations, graph storage structures, and parallelized local search algorithms. These improvements make HERO perform excellently in standard benchmark tests, not only improving the effect of query optimization but also ensuring the reliability and interpretability of the model.