Abstract:Fine-tuning large language models (LLMs) for specific domain tasks has achieved great success in Text-to-SQL tasks. However, these fine-tuned models often face challenges with multi-turn Text-to-SQL tasks caused by ambiguous or unanswerable questions. It is desired to enhance LLMs to handle multiple types of questions in multi-turn Text-to-SQL tasks. To address this, we propose a novel data augmentation method, called QDA-SQL, which generates multiple types of multi-turn Q\&A pairs by using LLMs. In QDA-SQL, we introduce a novel data augmentation method incorporating validation and correction mechanisms to handle complex multi-turn Text-to-SQL tasks. Experimental results demonstrate that QDA-SQL enables fine-tuned models to exhibit higher performance on SQL statement accuracy and enhances their ability to handle complex, unanswerable questions in multi-turn Text-to-SQL tasks. The generation script and test set are released at <a class="link-external link-https" href="https://github.com/mcxiaoxiao/QDA-SQL" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in multi - turn conversations, existing text - to - SQL models perform inadequately when dealing with complex, unanswerable or inappropriate questions. Specifically: 1. **Ambiguous questions in multi - turn conversations**: For example, a user may use the same word (such as "Glenn"), but this word may correspond to different columns (such as "donator_name" or "school_name") in different tables. The system needs to be able to recognize this ambiguity and seek clarification from the user instead of providing a potentially wrong answer. 2. **Unanswerable questions in multi - turn conversations**: For example, if the database lacks information about the donors' nationalities, the system needs to be able to explain why it cannot answer this question instead of attempting to generate a wrong SQL query. 3. **Inappropriate questions in multi - turn conversations**: For example, a user may ask daily - conversation questions that are not related to the database. The system needs to be able to recognize and respond appropriately instead of attempting to generate an irrelevant SQL query. To address these problems, the paper proposes the QDA - SQL method, which enhances the capabilities of large - language models (LLMs) by generating multiple types of multi - turn question - answer pairs, enabling them to better handle various types of questions in multi - turn conversations. The QDA - SQL method includes the following key steps: - **Interaction generation**: By randomly combining topic relationships and question - answer types, guide LLMs to generate diverse multi - turn question - answer pairs. - **Verification and optimization**: Check whether the generated question - answer pairs conform to the expected question - answer types, optimize the expression to improve naturalness and readability, and ensure the quality of the generated SQL queries through SQL execution scoring. - **State - flow design**: Model the Text - to - SQL reasoning process as a state - machine model (StateFlow), and ensure that LLMs can obtain relevant guidance at each step through dynamic prompting and state - management mechanisms, thereby improving the robustness and reliability of the model. Through these methods, QDA - SQL aims to significantly improve the performance of Text - to - SQL models in handling complex multi - turn conversation tasks.

QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL

S2M: Converting Single-Turn to Multi-Turn Datasets for Conversational Question Answering

Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies

Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation

CQR-SQL: Conversational Question Reformulation Enhanced Context-Dependent Text-to-SQL Parsers

Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistency

MIGA: A Unified Multi-task Generation Framework for Conversational Text-to-SQL

TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension

Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data

SA-SQL: A Schema-Aligned Framework for Text-to-SQL Through Large Language Models

Retrieval-augmented GPT-3.5-based Text-to-SQL Framework with Sample-aware Prompting and Dynamic Revision Chain

MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation

MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL

Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM

MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL

SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA

Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training

Large Language Model Enhanced Text-to-SQL Generation: A Survey

XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL