Abstract:The capability gap between open-source and closed-source large language models (LLMs) remains a challenge in text-to-SQL tasks. In this paper, we introduce a synthetic data approach that combines data produced by larger, more powerful models (strong models) with error information data generated by smaller, not well-aligned models (weak models). The method not only enhances the domain generalization of text-to-SQL models but also explores the potential of error data supervision through preference learning. Furthermore, we employ the synthetic data approach for instruction tuning on open-source LLMs, resulting SENSE, a specialized text-to-SQL model. The effectiveness of SENSE is demonstrated through state-of-the-art results on the SPIDER and BIRD benchmarks, bridging the performance gap between open-source models and methods prompted by closed-source models.

What problem does this paper attempt to address?

The paper primarily investigates the performance issues of open-source Large Language Models (LLMs) in the Text-to-SQL task. Specifically, the research aims to narrow the performance gap between open-source LLMs and closed-source LLMs (such as GPT-4) in the Text-to-SQL task. To achieve this goal, the researchers proposed a synthetic data method that combines "Strong Data" generated by powerful models and "Weak Data" generated by weaker models to improve the performance of open-source LLMs in the Text-to-SQL task. Strong Data is generated using powerful closed-source or open-source LLMs to enhance the diversity and complexity of the data, while Weak Data is generated by smaller, less aligned models and subsequently guided by Preference Learning to help the model learn from its mistakes. The researchers built a model specifically for the Text-to-SQL task, named SENSE, and used CodeLLaMA as the base model for Supervised Fine-tuning (SFT). Experimental results show that SENSE achieved state-of-the-art performance on the Spider and BIRD benchmarks, significantly improving the execution accuracy of open-source LLMs in the Text-to-SQL task, thereby narrowing the performance gap with closed-source models. Additionally, the paper evaluated SENSE's performance on several robustness datasets, including SYN, REALISTIC, and DK, demonstrating its strong capability in handling complex and varied data. Through fine-grained analysis of samples at different difficulty levels and a series of ablation experiments, the effectiveness of the proposed method and the importance of its components were further validated. Finally, the research also showcased the good transferability of the method across different types of LLMs.

Synthesizing Text-to-SQL Data from Weak and Strong LLMs

Analyzing the Effectiveness of Large Language Models on Text-to-SQL Synthesis

Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation

SA-SQL: A Schema-Aligned Framework for Text-to-SQL Through Large Language Models

Enhancing Text-to-SQL Capabilities of Large Language Models via Domain Database Knowledge Injection

SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended)

Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration

Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations

Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement

CodeS: Towards Building Open-source Language Models for Text-to-SQL

Exploring LLMs as a Source of Targeted Synthetic Textual Data to Minimize High Confidence Misclassifications

Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models

SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging

DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models

Large Language Model Enhanced Text-to-SQL Generation: A Survey

Leveraging Prior Experience: An Expandable Auxiliary Knowledge Base for Text-to-SQL

SilverSight: A Multi-Task Chinese Financial Large Language Model Based on Adaptive Semantic Space Learning

SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies