Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL

Dayton G. Thorpe,Andrew J. Duberstein,Ian A. Kinsey
2024-04-19
Abstract:The current state-of-the-art (SOTA) for automated text-to-SQL still falls well short of expert human performance as measured by execution accuracy (EX) on the BIRD-SQL benchmark. The most accurate methods are also slow and expensive. To advance the SOTA for text-to-SQL while reducing cost and improving speed, we explore the combination of low-cost fine tuning, novel methods for diverse retrieval-augmented generation (RAG) and new input and output formats that help large language models (LLMs) achieve higher EX. We introduce two new methods, Dubo-SQL v1 and v2. Dubo-SQL v1 sets a new record for EX on the holdout test set of BIRD-SQL. Dubo-SQL v2 achieves even higher performance on the BIRD-SQL dev set. Dubo-SQL v1 relies on LLMs from OpenAI, but uses the low-cost GPT-3.5 Turbo while exceeding the performance of the next-best model using OpenAI, which instead uses the more expensive GPT-4. Dubo-SQL v1 exceeds the performance of the next-best model using GPT-3.5 by over 20%. Dubo-SQL v2 uses GPT-4 Turbo and RAG in place of fine tuning to push EX higher.
Computation and Language,Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the execution accuracy (EX) of automated text - to - SQL conversion, while reducing the computational cost and increasing the processing speed. Currently, the state - of - the - art (SOTA) automated text - to - SQL systems still have an execution accuracy on the BIRD - SQL benchmark that is far lower than that of expert humans, and these methods are usually both slow and expensive. To advance the SOTA and reduce costs and increase speed, the authors explored the following methods: 1. **Low - cost fine - tuning**: Use a lower - cost model for fine - tuning to improve performance. 2. **Diverse Retrieval - Augmented Generation (RAG)**: Enhance the generation process by selecting diverse examples. 3. **New input and output formats**: Design new formats to help large language models (LLMs) better understand and generate SQL queries. Specifically, the authors introduced two new methods: - **Dubo - SQL v1**: Use OpenAI's GPT - 3.5 Turbo for fine - tuning, achieving the highest recorded execution accuracy on the BIRD - SQL test set, and at a lower cost than other models that use the more expensive GPT - 4. - **Dubo - SQL v2**: Use GPT - 4 Turbo and the RAG method to further improve the execution accuracy on the BIRD - SQL development set. ### Main contributions 1. **Low - cost fine - tuning**: Dubo - SQL v1 shows how to use the low - cost GPT - 3.5 Turbo for fine - tuning to exceed the performance of models that use the more expensive GPT - 4. 2. **Diverse RAG**: Dubo - SQL v2 improves the generation quality by selecting diverse examples, further increasing the execution accuracy. 3. **New input and output formats**: Improve the input and output formats to enable LLMs to better understand and generate SQL queries. ### Experimental results - **Dubo - SQL v1**: Achieved an execution accuracy of 60.71% on the BIRD - SQL test set, significantly outperforming other models based on GPT - 3.5 Turbo. - **Dubo - SQL v2**: Achieved an execution accuracy of 61.47% on the BIRD - SQL development set, which is 1.63% higher than Dubo - SQL v1, but slightly lower than MCS - SQL and GRA - SQL. ### Cost analysis - **Dubo - SQL v1**: The training cost is $273, and the inference cost is less than $0.01 per natural language question. - **Dubo - SQL v2**: The inference cost is relatively high, approximately $0.14 per natural language question, but still lower than the cost of DIN - SQL. ### Conclusion By introducing low - cost fine - tuning and diverse RAG methods, the authors have successfully improved the execution accuracy of the text - to - SQL task and significantly reduced the computational cost. These methods provide new directions for future research, especially when dealing with large - scale enterprise databases.