Abstract:Query rewrite is essential for optimizing SQL queries to improve their execution efficiency without changing their results. Traditionally, this task has been tackled through heuristic and learning-based methods, each with its limitations in terms of inferior quality and low robustness. Recent advancements in LLMs offer a new paradigm by leveraging their superior natural language and code comprehension abilities. Despite their potential, directly applying LLMs like GPT-4 has faced challenges due to problems such as hallucinations, where the model might generate inaccurate or irrelevant results. To address this, we propose R-Bot, an LLM-based query rewrite system with a systematic approach. We first design a multi-source rewrite evidence preparation pipeline to generate query rewrite evidences for guiding LLMs to avoid hallucinations. We then propose a hybrid structure-semantics retrieval method that combines structural and semantic analysis to retrieve the most relevant rewrite evidences for effectively answering an online query. We next propose a step-by-step LLM rewrite method that iteratively leverages the retrieved evidences to select and arrange rewrite rules with self-reflection. We conduct comprehensive experiments on widely used benchmarks, and demonstrate the superior performance of our system, R-Bot, surpassing state-of-the-art query rewrite methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to optimize SQL query rewrite so as to improve query execution efficiency without changing its results. Traditional methods, such as heuristic - based and learning - based methods, have limitations in terms of quality and robustness. Specifically: 1. **Heuristic - based methods**: - **Fixed - order method**: Apply rules in a fixed order obtained from practical experience, but may not be able to provide optimal results for queries that require different rule orders. - **Heuristic acceleration method**: Try to comprehensively explore various rule orders through heuristic acceleration, but may ignore the dependencies between rules. 2. **Learning - based methods**: - Use neural networks to learn from historical query rewrites and apply the most favorable rules for query rewrite, but have poor adaptability when facing unseen database schemas and need to train a large number of new query rewrite examples additionally. 3. **Challenges in the application of large language models (LLM)**: - Directly applying LLM such as GPT - 4 for query rewrite faces the problem of hallucinations, that is, generating inaccurate or irrelevant results. For example, in the DSB benchmark test, the success rate of directly using GPT - 4 for query rewrite is 5.3%, which is significantly low. To address these challenges, the authors propose an LLM - based query rewrite system R - Bot, aiming to improve query rewrite in the following ways: - **Multi - source rewrite evidence preparation pipeline**: Collect and prepare rewrite evidence from multiple sources to guide LLM to avoid hallucinations. - **Hybrid structural - semantic retrieval method**: Combine structural and semantic analysis to retrieve the most relevant rewrite evidence and effectively answer online queries. - **Step - by - step LLM rewrite method**: Iteratively use the retrieved evidence to select and arrange rewrite rules, and gradually optimize the rewrite process through self - reflection. In summary, the main goal of this paper is to develop an efficient and robust query rewrite system, taking advantage of the powerful natural language and code understanding capabilities of LLM while overcoming its hallucination problem, thereby significantly improving the performance of query rewrite.

R-Bot: An LLM-based Query Rewrite System

LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency

Crafting the Path: Robust Query Rewriting for Information Retrieval

Query Rewriting via Large Language Models

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Query Rewriting for Retrieval-Augmented Large Language Models

Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting

Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing

DMQR-RAG: Diverse Multi-Query Rewriting for RAG

Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter

Agent4Ranking: Semantic Robust Ranking via Personalized Query Rewriting Using Multi-agent LLM

Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling

RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation

Know where to go: Make LLM a relevant, responsible, and trustworthy searchers

Context Aware Query Rewriting for Text Rankers using LLM

Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases

A learned query rewrite system using Monte Carlo tree search

Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities

RaFe: Ranking Feedback Improves Query Rewriting for RAG

Refiner: Restructure Retrieved Content Efficiently to Advance Question-Answering Capabilities

MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models