Synthesizing Text-to-SQL Data from Weak and Strong LLMs

Jiaxi Yang,Binyuan Hui,Min Yang,Jian Yang,Junyang Lin,Chang Zhou
2024-08-06
Abstract:The capability gap between open-source and closed-source large language models (LLMs) remains a challenge in text-to-SQL tasks. In this paper, we introduce a synthetic data approach that combines data produced by larger, more powerful models (strong models) with error information data generated by smaller, not well-aligned models (weak models). The method not only enhances the domain generalization of text-to-SQL models but also explores the potential of error data supervision through preference learning. Furthermore, we employ the synthetic data approach for instruction tuning on open-source LLMs, resulting SENSE, a specialized text-to-SQL model. The effectiveness of SENSE is demonstrated through state-of-the-art results on the SPIDER and BIRD benchmarks, bridging the performance gap between open-source models and methods prompted by closed-source models.
Computation and Language
What problem does this paper attempt to address?
The paper primarily investigates the performance issues of open-source Large Language Models (LLMs) in the Text-to-SQL task. Specifically, the research aims to narrow the performance gap between open-source LLMs and closed-source LLMs (such as GPT-4) in the Text-to-SQL task. To achieve this goal, the researchers proposed a synthetic data method that combines "Strong Data" generated by powerful models and "Weak Data" generated by weaker models to improve the performance of open-source LLMs in the Text-to-SQL task. Strong Data is generated using powerful closed-source or open-source LLMs to enhance the diversity and complexity of the data, while Weak Data is generated by smaller, less aligned models and subsequently guided by Preference Learning to help the model learn from its mistakes. The researchers built a model specifically for the Text-to-SQL task, named SENSE, and used CodeLLaMA as the base model for Supervised Fine-tuning (SFT). Experimental results show that SENSE achieved state-of-the-art performance on the Spider and BIRD benchmarks, significantly improving the execution accuracy of open-source LLMs in the Text-to-SQL task, thereby narrowing the performance gap with closed-source models. Additionally, the paper evaluated SENSE's performance on several robustness datasets, including SYN, REALISTIC, and DK, demonstrating its strong capability in handling complex and varied data. Through fine-grained analysis of samples at different difficulty levels and a series of ablation experiments, the effectiveness of the proposed method and the importance of its components were further validated. Finally, the research also showcased the good transferability of the method across different types of LLMs.