SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement

Chaofan Li,Yingxia Shao,Zheng Liu

2024-08-09

Abstract:Recent advancements in large language models (LLMs) have significantly contributed to the progress of the Text-to-SQL task. A common requirement in many of these works is the post-correction of SQL queries. However, the majority of this process entails analyzing error cases to develop prompts with rules that eliminate model bias. And there is an absence of execution verification for SQL queries. In addition, the prevalent techniques primarily depend on GPT-4 and few-shot prompts, resulting in expensive costs. To investigate the effective methods for SQL refinement in a cost-efficient manner, we introduce Semantic-Enhanced Text-to-SQL with Adaptive Refinement (SEA-SQL), which includes Adaptive Bias Elimination and Dynamic Execution Adjustment, aims to improve performance while minimizing resource expenditure with zero-shot prompts. Specifically, SEA-SQL employs a semantic-enhanced schema to augment database information and optimize SQL queries. During the SQL query generation, a fine-tuned adaptive bias eliminator is applied to mitigate inherent biases caused by the LLM. The dynamic execution adjustment is utilized to guarantee the executability of the bias eliminated SQL query. We conduct experiments on the Spider and BIRD datasets to demonstrate the effectiveness of this framework. The results demonstrate that SEA-SQL achieves state-of-the-art performance in the GPT3.5 scenario with 9%-58% of the generation cost. Furthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the generation cost.

Databases

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are the three major limitations in existing Text - to - SQL methods: 1. **Model Bias**: Existing large - language models (LLMs) often exhibit inherent biases when generating SQL queries. For example, GPT - 3.5 is case - insensitive and may incorrectly convert uppercase letters in the question to lowercase letters. 2. **Unexecutable SQL Queries**: Although LLMs can generate SQL queries, they are unable to determine whether the generated SQL is executable. This results in the generated SQL potentially having syntax or logical errors and being unable to execute correctly. 3. **Expensive Inference Cost**: Many existing methods rely on GPT - 4 for SQL generation, and the cost of using GPT - 4 is very high. Moreover, most GPT - 4 - based methods rely on few - shot prompting, which further increases the computational cost. To solve these problems, the author proposes a framework named **Semantic - Enhanced Text - to - SQL with Adaptive Refinement (SEA - SQL)**. This framework aims to improve the Text - to - SQL task in the following ways: - **Semantic - enhanced Schema**: Optimize the generation of SQL queries by enhancing database information. - **Adaptive Bias Elimination**: Utilize a fine - tuned small LLM (such as Mistral - 7B) to eliminate the inherent biases in SQL queries generated by large LLMs. - **Dynamic Execution Adjustment**: Ensure that the generated SQL queries are executable and improve the accuracy of the queries through an iterative process of execution, reflection, and correction. Through these improvements, SEA - SQL not only improves performance but also has a significantly lower generation cost than other methods, especially achieving an effect comparable to GPT - 4 on the basis of GPT - 3.5.

SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement

PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistency

CatSQL: Towards Real World Natural Language to SQL Applications.

RH-SQL: Refined Schema and Hardness Prompt for Text-to-SQL

Leveraging Prior Experience: An Expandable Auxiliary Knowledge Base for Text-to-SQL

Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement

Retrieval-augmented GPT-3.5-based Text-to-SQL Framework with Sample-aware Prompting and Dynamic Revision Chain

MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation

MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL

SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

RSL-SQL: Robust Schema Linking in Text-to-SQL Generation

SA-SQL: A Schema-Aligned Framework for Text-to-SQL Through Large Language Models

SQLFixAgent: Towards Semantic-Accurate SQL Generation via Multi-Agent Collaboration

Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning

SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration

SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended)

Solid-SQL: Enhanced Schema-linking based In-context Learning for Robust Text-to-SQL

Semantic Enhanced Text-to-SQL Parsing Via Iteratively Learning Schema Linking Graph

Prompting GPT-3.5 for Text-to-SQL with De-semanticization and Skeleton Retrieval

MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL