R-Bot: An LLM-based Query Rewrite System

Zhaoyan Sun,Xuanhe Zhou,Guoliang Li
2024-12-03
Abstract:Query rewrite is essential for optimizing SQL queries to improve their execution efficiency without changing their results. Traditionally, this task has been tackled through heuristic and learning-based methods, each with its limitations in terms of inferior quality and low robustness. Recent advancements in LLMs offer a new paradigm by leveraging their superior natural language and code comprehension abilities. Despite their potential, directly applying LLMs like GPT-4 has faced challenges due to problems such as hallucinations, where the model might generate inaccurate or irrelevant results. To address this, we propose R-Bot, an LLM-based query rewrite system with a systematic approach. We first design a multi-source rewrite evidence preparation pipeline to generate query rewrite evidences for guiding LLMs to avoid hallucinations. We then propose a hybrid structure-semantics retrieval method that combines structural and semantic analysis to retrieve the most relevant rewrite evidences for effectively answering an online query. We next propose a step-by-step LLM rewrite method that iteratively leverages the retrieved evidences to select and arrange rewrite rules with self-reflection. We conduct comprehensive experiments on widely used benchmarks, and demonstrate the superior performance of our system, R-Bot, surpassing state-of-the-art query rewrite methods.
Databases,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to optimize SQL query rewrite so as to improve query execution efficiency without changing its results. Traditional methods, such as heuristic - based and learning - based methods, have limitations in terms of quality and robustness. Specifically: 1. **Heuristic - based methods**: - **Fixed - order method**: Apply rules in a fixed order obtained from practical experience, but may not be able to provide optimal results for queries that require different rule orders. - **Heuristic acceleration method**: Try to comprehensively explore various rule orders through heuristic acceleration, but may ignore the dependencies between rules. 2. **Learning - based methods**: - Use neural networks to learn from historical query rewrites and apply the most favorable rules for query rewrite, but have poor adaptability when facing unseen database schemas and need to train a large number of new query rewrite examples additionally. 3. **Challenges in the application of large language models (LLM)**: - Directly applying LLM such as GPT - 4 for query rewrite faces the problem of hallucinations, that is, generating inaccurate or irrelevant results. For example, in the DSB benchmark test, the success rate of directly using GPT - 4 for query rewrite is 5.3%, which is significantly low. To address these challenges, the authors propose an LLM - based query rewrite system R - Bot, aiming to improve query rewrite in the following ways: - **Multi - source rewrite evidence preparation pipeline**: Collect and prepare rewrite evidence from multiple sources to guide LLM to avoid hallucinations. - **Hybrid structural - semantic retrieval method**: Combine structural and semantic analysis to retrieve the most relevant rewrite evidence and effectively answer online queries. - **Step - by - step LLM rewrite method**: Iteratively use the retrieved evidence to select and arrange rewrite rules, and gradually optimize the rewrite process through self - reflection. In summary, the main goal of this paper is to develop an efficient and robust query rewrite system, taking advantage of the powerful natural language and code understanding capabilities of LLM while overcoming its hallucination problem, thereby significantly improving the performance of query rewrite.