Abstract:We propose a novel model for learned query optimization which provides query hints leading to better execution plans. The model addresses the three key challenges in learned hint-based query optimization: reliable hint recommendation (ensuring non-degradation of query latency), efficient hint exploration, and fast inference. We provide an in-depth analysis of existing NN-based approaches to hint-based optimization and experimentally confirm the named challenges for them. Our alternative solution consists of a new inference schema based on an ensemble of context-aware models and a graph storage for reliable hint suggestion and fast inference, and a budget-controlled training procedure with a local search algorithm that solves the issue of exponential search space exploration. In experiments on standard benchmarks, our model demonstrates optimization capability close to the best achievable with coarse-grained hints. Controlling the degree of parallelism (query dop) in addition to operator-related hints enables our model to achieve 3x latency improvement on JOB benchmark which sets a new standard for optimization. Our model is interpretable and easy to debug, which is particularly important for deployment in production.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges existing in the current learning - based query optimization methods in providing reliable and efficient query hints. Specifically, the article mainly focuses on the following three key issues: 1. **Reliable Hint Recommendation**: Ensure that the recommended query hints will not lead to the deterioration of query latency. 2. **Efficient Hint Exploration**: Reduce the search space of hint combinations, thereby speeding up training and inference time. 3. **Fast Inference**: In practical applications, ensure that the model can quickly give optimization suggestions. To solve these problems, the author proposes a new model named HERO (Hint - based Efficient and Reliable Query Optimizer). HERO improves the existing learning - based query optimization methods in the following aspects: - **Reliable Hint Recommendation**: HERO adopts a new inference scheme, based on the integration of context - aware models and graph storage structures, ensuring the reliability of hint recommendations. At the same time, it uses a budget - controlled training process, combined with a local search algorithm, to solve the problem of exponential - level search space. - **Efficient Hint Exploration**: HERO introduces a parallelized local search algorithm, which can be adaptively adjusted under different time budgets, improving the exploration efficiency. In addition, by parameterizing the local search process, it balances the relationship between performance improvement and search depth. - **Fast Inference**: HERO designs a lightweight and reliable alternative, avoiding the black - box problem of neural network models, making the model more transparent, easier to debug, and able to quickly give optimization suggestions. ### Formula Representation To understand the working principle of HERO more clearly, we can use some formulas to represent its core ideas. Suppose the execution time of query \(q\) is \(t(q,\theta,S)\), where \(\theta\) is the hint set and \(S\) is the statistical information. The goal of HERO is to minimize the query latency while ensuring the reliability of prediction: \[ \min_{M} E_{q_i\sim P_Q}[t_M + t_i|\theta = M(\text{info})] \] where \(t_M\) is the model execution time and the time for calculating information, \(t_i\) is the execution time of query \(q_i\), and \(\theta = M(\text{info})\) represents the hint set predicted according to the input information. To ensure reliability, HERO also requires that for all queries \(q_i\in\text{supp}(P_Q)\): \[ t_M + t_i|\theta = M(\text{info})\leq t_i|\theta_{\text{default}} \] ### Summary HERO effectively solves the problems existing in the current learning - based query optimization methods by introducing new inference schemes, context - aware model integrations, graph storage structures, and parallelized local search algorithms. These improvements make HERO perform excellently in standard benchmark tests, not only improving the effect of query optimization but also ensuring the reliability and interpretability of the model.

HERO: Hint-Based Efficient and Reliable Query Optimizer

COOOL: A Learning-To-Rank Approach for SQL Hint Recommendations

Learned Query Optimizers: Evaluation and Improvement

Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection

Cost-Based or Learning-Based?

BitE : Accelerating Learned Query Optimization in a Mixed-Workload Environment

AQUA+: Query Optimization for Hybrid Database-MapReduce System.

FOSS: A Self-Learned Doctor for Query Optimizer

Optimizing Machine Learning Inference Queries with Correlative Proxy Models

Qr-Hint: Actionable Hints Towards Correcting Wrong SQL Queries

Lemo: A Cache-Enhanced Learned Optimizer for Concurrent Queries

Understanding and Optimizing Conjunctive Predicates Under Memory-Efficient Storage Layouts

Hebe: an Order-Oblivious and High-Performance Execution Scheme for Conjunctive Predicates.

Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency

AlphaQO: Robust Learned Query Optimizer

Detecting optimization bugs in database engines via non-optimizing reference engine construction

Online Sketch-based Query Optimization

LOGER: A Learned Optimizer Towards Generating Efficient and Robust Query Execution Plans

Learned Query Optimization by Constraint-Based Query Plan Augmentation

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

Scalable Computation of Optimized Queries for Sequential Diagnosis