SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy

Tingkai Zhang,Chaoyu Chen,Cong Liao,Jun Wang,Xudong Zhao,Hang Yu,Jianchao Wang,Jianguo Li,Wenhui Shi

2024-07-19

Abstract:Text-to-SQL conversion is a critical innovation, simplifying the transition from complex SQL to intuitive natural language queries, especially significant given SQL's prevalence in the job market across various roles. The rise of Large Language Models (LLMs) like GPT-3.5 and GPT-4 has greatly advanced this field, offering improved natural language understanding and the ability to generate nuanced SQL statements. However, the potential of open-source LLMs in Text-to-SQL applications remains underexplored, with many frameworks failing to leverage their full capabilities, particularly in handling complex database queries and incorporating feedback for iterative refinement. Addressing these limitations, this paper introduces SQLfuse, a robust system integrating open-source LLMs with a suite of tools to enhance Text-to-SQL translation's accuracy and usability. SQLfuse features four modules: schema mining, schema linking, SQL generation, and a SQL critic module, to not only generate but also continuously enhance SQL query quality. Demonstrated by its leading performance on the Spider Leaderboard and deployment by Ant Group, SQLfuse showcases the practical merits of open-source LLMs in diverse business contexts.

Computation and Language,Artificial Intelligence,Databases

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address several key issues in the conversion of text to SQL (Text-to-SQL) and enhance the accuracy and usability of this process by introducing a new system called SQLfuse. Specifically: 1. **Limitations of Existing Frameworks**: Current Text-to-SQL frameworks based on large language models (LLMs) fail to fully leverage the capabilities of open-source LLMs, particularly in handling complex database queries and integrating feedback for iterative improvements. 2. **Handling Complex Relationships**: Existing Text-to-SQL systems often overlook one-to-many relationships between tables and the correspondence between enumerated values and natural language, which is especially important in constructing aggregate queries. 3. **Utilizing Execution Error Feedback**: Existing systems typically do not use execution error feedback to correct inaccuracies in SQL, even though such feedback can provide valuable correction clues. 4. **Lack of Evaluation Module**: There is a lack of an evaluation module to assess and select the best SQL output generated by LLMs, which can significantly improve the quality of the results. To address these issues, the paper proposes the SQLfuse system, which consists of four synergistic modules: schema mining, schema linking, SQL generation (SQLgen), and SQL evaluation modules. These modules not only generate SQL queries but also continuously optimize to improve query quality. SQLfuse has performed excellently on the Spider Leaderboard, achieving an accuracy of 85.6%, and has been validated in practical applications at Ant Group.

SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy

Enhancing Text-to-SQL Translation for Financial System Design

SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended)

Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis

SA-SQL: A Schema-Aligned Framework for Text-to-SQL Through Large Language Models

F-SQL: Fuse Table Schema and Table Content for Single-Table Text2SQL Generation

CatSQL: Towards Real World Natural Language to SQL Applications.

Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement

Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation

SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

RSL-SQL: Robust Schema Linking in Text-to-SQL Generation

DataGpt-SQL-7B: An Open-Source Language Model for Text-to-SQL

SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL

PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistency

MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL

AI-Assisted SQL Authoring at Industry Scale

Evaluating LLMs for Text-to-SQL Generation With Complex SQL Workload

Enhancing Text-to-SQL Capabilities of Large Language Models via Domain Database Knowledge Injection

SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement