Abstract:In this paper, we propose a novel SQL guided pre-training framework STAR for context-dependent text-to-SQL parsing, which leverages contextual information to enrich natural language (NL) utterance and table schema representations for text-to-SQL conversations. Concretely, we propose two novel pre-training objectives which respectively explore the context-dependent interactions of NL utterances and SQL queries within each text-to-SQL conversation: (i) schema state tracking (SST) objective that tracks and explores the schema states of context-dependent SQL queries in the form of schema-states by predicting and updating the value of each schema slot during interaction; (ii) utterance dependency tracking (UDT) objective that employs weighted contrastive learning to pull together two semantically similar NL utterances and push away the representations of semantically dissimilar NL utterances within each conversation. In addition, we construct a high-quality large-scale context-dependent text-to-SQL conversation corpus to pre-train STAR. Extensive experiments show that STAR achieves new state-of-the-art performance on two downstream benchmarks (SParC and CoSQL), significantly outperforming previous pre-training methods and ranking first on the leaderboard. We believe the release of the constructed corpus, codebase and pre-trained STAR checkpoints would push forward the research in this area. For reproducibility, we release our code and data at <a class="link-external link-https" href="https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/star" rel="external noopener nofollow">this https URL</a>.

ConDA : State-Based Data Augmentation for Context-Dependent Text-to-sql

Contextual Data Augmentation for Task-Oriented Dialog Systems

TOD-DA: Towards Boosting the Robustness of Task-oriented Dialogue Modeling on Spoken Conversations

QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL

Latent Conditional Diffusion-based Data Augmentation for Continuous-Time Dynamic Graph Model

STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing

Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL.

Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies

Improving Grammatical Error Correction via Contextual Data Augmentation

IGSQL: Database Schema Interaction Graph Based Neural Model for Context-Dependent Text-to-SQL Generation

Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

IGSQL: Database Schema Interaction Graph Based Neural Model for Context-Dependent Text-to-SQL Generation

Exploring the Compositional Generalization in Context Dependent Text-to-SQL Parsing

DAC: Decomposed Automation Correction for Text-to-SQL

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

CQR-SQL: Conversational Question Reformulation Enhanced Context-Dependent Text-to-SQL Parsers

An efficient text augmentation approach for contextualized Mandarin speech recognition

Generalizing Conversational Dense Retrieval via LLM-Cognition Data Augmentation

Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks

Semi-Automatic Construction of Text-to-SQL Data for Domain Transfer