Abstract:The current state-of-the-art (SOTA) for automated text-to-SQL still falls well short of expert human performance as measured by execution accuracy (EX) on the BIRD-SQL benchmark. The most accurate methods are also slow and expensive. To advance the SOTA for text-to-SQL while reducing cost and improving speed, we explore the combination of low-cost fine tuning, novel methods for diverse retrieval-augmented generation (RAG) and new input and output formats that help large language models (LLMs) achieve higher EX. We introduce two new methods, Dubo-SQL v1 and v2. Dubo-SQL v1 sets a new record for EX on the holdout test set of BIRD-SQL. Dubo-SQL v2 achieves even higher performance on the BIRD-SQL dev set. Dubo-SQL v1 relies on LLMs from OpenAI, but uses the low-cost GPT-3.5 Turbo while exceeding the performance of the next-best model using OpenAI, which instead uses the more expensive GPT-4. Dubo-SQL v1 exceeds the performance of the next-best model using GPT-3.5 by over 20%. Dubo-SQL v2 uses GPT-4 Turbo and RAG in place of fine tuning to push EX higher.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the execution accuracy (EX) of automated text - to - SQL conversion, while reducing the computational cost and increasing the processing speed. Currently, the state - of - the - art (SOTA) automated text - to - SQL systems still have an execution accuracy on the BIRD - SQL benchmark that is far lower than that of expert humans, and these methods are usually both slow and expensive. To advance the SOTA and reduce costs and increase speed, the authors explored the following methods: 1. **Low - cost fine - tuning**: Use a lower - cost model for fine - tuning to improve performance. 2. **Diverse Retrieval - Augmented Generation (RAG)**: Enhance the generation process by selecting diverse examples. 3. **New input and output formats**: Design new formats to help large language models (LLMs) better understand and generate SQL queries. Specifically, the authors introduced two new methods: - **Dubo - SQL v1**: Use OpenAI's GPT - 3.5 Turbo for fine - tuning, achieving the highest recorded execution accuracy on the BIRD - SQL test set, and at a lower cost than other models that use the more expensive GPT - 4. - **Dubo - SQL v2**: Use GPT - 4 Turbo and the RAG method to further improve the execution accuracy on the BIRD - SQL development set. ### Main contributions 1. **Low - cost fine - tuning**: Dubo - SQL v1 shows how to use the low - cost GPT - 3.5 Turbo for fine - tuning to exceed the performance of models that use the more expensive GPT - 4. 2. **Diverse RAG**: Dubo - SQL v2 improves the generation quality by selecting diverse examples, further increasing the execution accuracy. 3. **New input and output formats**: Improve the input and output formats to enable LLMs to better understand and generate SQL queries. ### Experimental results - **Dubo - SQL v1**: Achieved an execution accuracy of 60.71% on the BIRD - SQL test set, significantly outperforming other models based on GPT - 3.5 Turbo. - **Dubo - SQL v2**: Achieved an execution accuracy of 61.47% on the BIRD - SQL development set, which is 1.63% higher than Dubo - SQL v1, but slightly lower than MCS - SQL and GRA - SQL. ### Cost analysis - **Dubo - SQL v1**: The training cost is $273, and the inference cost is less than $0.01 per natural language question. - **Dubo - SQL v2**: The inference cost is relatively high, approximately $0.14 per natural language question, but still lower than the cost of DIN - SQL. ### Conclusion By introducing low - cost fine - tuning and diverse RAG methods, the authors have successfully improved the execution accuracy of the text - to - SQL task and significantly reduced the computational cost. These methods provide new directions for future research, especially when dealing with large - scale enterprise databases.

Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL

SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement

Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement

Leveraging Prior Experience: An Expandable Auxiliary Knowledge Base for Text-to-SQL

DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models

MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL

Bertrand-DR: Improving Text-to-SQL using a Discriminative Re-ranker

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging

SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models

Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation

RSL-SQL: Robust Schema Linking in Text-to-SQL Generation

Text2SQL is Not Enough: Unifying AI and Databases with TAG

Benchmarking and Improving Text-to-SQL Generation under Ambiguity

Battle of the Large Language Models: Dolly vs LLaMA vs Vicuna vs Guanaco vs Bard vs ChatGPT -- A Text-to-SQL Parsing Comparison

CatSQL: Towards Real World Natural Language to SQL Applications.

DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction

Reboost Large Language Model-based Text-to-SQL, Text-to-Python, and Text-to-Function -- with Real Applications in Traffic Domain

MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL

LR-SQL: A Supervised Fine-Tuning Method for Text2SQL Tasks under Low-Resource Scenarios