Enhancing Text2SQL Generation with Syntactic Information and Multi-task Learning

Haochen Li,Minghua Nuo
DOI: https://doi.org/10.1007/978-3-031-44213-1_32
2023-01-01
Abstract:The Text2SQL task aims to convert natural language (NL) questions into SQL queries. Most rule-based traditional syntactic parsers are often ignoring the syntactic information in the question and unseen database schemas, which lead to lower generalization ability. To breakthrough these limitations, we propose a novel model Syn-RGAT that leverages syntactic information of questions. Specifically, our model jointly encodes database schemas and questions by mapping them into a graph, which is then passed through a relation-aware graph attention network (RGAT). For the questions, map it as a syntax information graph and use syntactic augmentation module to learn all the nodes representations of the graph. Then the outputs of RGAT and syntactic augmentation module are integrated. Additionally, to assist the model distinguish between different database schemas, we introduce a graph pruning task and form a multi-task framework which shares the encoder of Text2SQL task. We use a heuristic learning method to combine graph pruning tasks with Text2SQL. In experimental part, Syn-RGAT outperforms all baseline models on the Spider dataset, and we further improve the performance more than 8% with BERT.
What problem does this paper attempt to address?