Abstract:Schema linking is a crucial step in Text-to-SQL pipelines. Its goal is to retrieve the relevant tables and columns of a target database for a user's query while disregarding irrelevant ones. However, imperfect schema linking can often exclude required columns needed for accurate query generation. In this work, we revisit schema linking when using the latest generation of large language models (LLMs). We find empirically that newer models are adept at utilizing relevant schema elements during generation even in the presence of large numbers of irrelevant ones. As such, our Text-to-SQL pipeline entirely forgoes schema linking in cases where the schema fits within the model's context window in order to minimize issues due to filtering required schema elements. Furthermore, instead of filtering contextual information, we highlight techniques such as augmentation, selection, and correction, and adopt them to improve the accuracy of our Text-to-SQL pipeline. Our approach ranks first on the BIRD benchmark achieving an accuracy of 71.83%.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to explore the necessity and effectiveness of schema linking in the Text - to - SQL conversion task in the era of modern large - language models (LLMs). Specifically, the paper mainly focuses on the following aspects: 1. **The role of schema linking**: - Schema linking is a crucial step in the Text - to - SQL pipeline. Its goal is to retrieve tables and columns relevant to the user query from the target database while ignoring irrelevant elements. - However, imperfect schema linking may exclude columns required for generating accurate SQL queries, thus leading to errors. 2. **The impact of LLMs on schema linking**: - The paper experimentally proves that as the reasoning ability of LLMs improves, these models can still correctly utilize relevant schema elements in the presence of a large number of irrelevant elements. - Therefore, when the schema can be fully placed in the model's context window, the schema linking step can be completely omitted to reduce problems caused by filtering required schema elements. 3. **Proposing alternatives**: - In order to improve the accuracy of the Text - to - SQL pipeline, the paper proposes using techniques such as augmentation, selection, and correction instead of relying on schema linking to filter context information. 4. **Experimental verification**: - The paper verifies the impact of different schema linking methods, augmentation techniques, selection techniques, and correction techniques on SQL generation accuracy through a series of experiments. - The experimental results show that for the latest LLMs, the benefits of schema linking are reduced and may even lead to performance degradation; while for earlier or smaller models, schema linking can still bring significant accuracy improvements. 5. **Conclusion**: - The paper concludes that in the most advanced LLMs, if the schema can be fully placed within the context window, schema linking is not necessary. But for earlier or smaller models, the accuracy gain of schema linking is still worth considering. - In actual data warehouse scenarios, when the entire schema exceeds the context window, the LLM's context window should be maximally utilized, and the most relevant columns should be selected to retain necessary schema elements. In summary, through experiments and analysis, this paper challenges the traditional view that schema linking is essential in the Text - to - SQL task and provides new ideas and technical directions for future research and applications.

The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models

RSL-SQL: Robust Schema Linking in Text-to-SQL Generation

SQL-to-Schema Enhances Schema Linking in Text-to-SQL

Semantic Enhanced Text-to-SQL Parsing Via Iteratively Learning Schema Linking Graph

E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL

SA-SQL: A Schema-Aligned Framework for Text-to-SQL Through Large Language Models

RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL

A Tale of Two Linkings: Dynamically Gating Between Schema Linking and Structural Linking for Text-to-SQL Parsing.

Solid-SQL: Enhanced Schema-linking based In-context Learning for Robust Text-to-SQL

LI-EMRSQL: Linking Information Enhanced Text2SQL Parsing on Complex Electronic Medical Records

Schema Matching with Large Language Models: an Experimental Study

Analyzing the Effectiveness of Large Language Models on Text-to-SQL Synthesis

Bertrand-DR: Improving Text-to-SQL using a Discriminative Re-ranker

MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL

CHESS: Contextual Harnessing for Efficient SQL Synthesis

Schema-Aware Multi-Task Learning for Complex Text-to-SQL

SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy

CRUSH4SQL: Collective Retrieval Using Schema Hallucination For Text2SQL

Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning

Improving Schema Matching with Linked Data

Enhancing Text-to-SQL Capabilities of Large Language Models via Domain Database Knowledge Injection