Faster and Better Grammar-based Text-to-SQL Parsing via Clause-level Parallel Decoding and Alignment Loss

Kun Wu,Lijie Wang,Zhenghua Li,Xinyan Xiao
DOI: https://doi.org/10.48550/arXiv.2204.12186
2022-04-26
Abstract:Grammar-based parsers have achieved high performance in the cross-domain text-to-SQL parsing task, but suffer from low decoding efficiency due to the much larger number of actions for grammar selection than that of tokens in SQL queries. Meanwhile, how to better align SQL clauses and question segments has been a key challenge for parsing performance. Therefore, this paper proposes clause-level parallel decoding and alignment loss to enhance two high-performance grammar-based parsers, i.e., RATSQL and LGESQL. Experimental results of two parsers show that our method obtains consistent improvements both in accuracy and decoding speed.
Computation and Language
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: 1. **Low decoding efficiency**: Grammar - based parsers have achieved high performance in cross - domain text - to - SQL parsing tasks. However, because the number of actions for selecting grammar is much larger than the number of tokens in SQL queries, it leads to low decoding efficiency. This makes the decoding process very time - consuming, especially in practical applications where a quick response is required. 2. **Poor alignment between SQL clauses and question fragments**: How to better align SQL clauses and question fragments is a key challenge in improving parsing performance. Existing methods often have difficulty effectively capturing the alignment relationship between SQL clauses and question fragments when dealing with complex SQL structures. To solve the above problems, the paper proposes two strategies, namely **clause - level parallel decoding** and **alignment loss**, to enhance two high - performance grammar - based parsers, RATSQL and LGESQL. Experimental results show that these methods have achieved significant improvements in both accuracy and decoding speed. ### Specific methods 1. **Clause - level parallel decoding**: - By generating SQL clauses in parallel instead of sequentially, the decoding efficiency can be significantly improved. Each clause is generated independently and no longer depends on the state of the previous clause. - This method takes advantage of the loose association between the generation of different clauses, thereby increasing the decoding speed. 2. **Alignment loss**: - A new training loss, namely alignment loss, is introduced to encourage the model to pay attention to relevant input question fragments when generating clauses. - Through alignment loss, the model can more accurately capture the alignment relationship between SQL clauses and question fragments, thereby improving the accuracy of parsing. ### Experimental results - **Improvement in accuracy**: The experimental results on the Spider dataset show that after using these two strategies, the accuracy of RATSQL and LGESQL has increased by 0.6% and 0.2% respectively. - **Improvement in decoding speed**: The decoding speed has increased by 18.9% and 35.5% respectively. Especially for LGESQL, because its grammar is simpler and the action sequence is shorter, the improvement is more obvious. ### Conclusion The clause - level parallel decoding and alignment loss methods proposed in the paper effectively improve the efficiency and accuracy of grammar - based text - to - SQL parsing models. These improvements are of great significance in handling complex and cross - domain SQL query tasks.