Recent Advances in Text-to-SQL: A Survey of What We Have and What We Expect

Naihao Deng,Yulong Chen,Yue Zhang
DOI: https://doi.org/10.48550/arXiv.2208.10099
2022-08-22
Abstract:Text-to-SQL has attracted attention from both the natural language processing and database communities because of its ability to convert the semantics in natural language into SQL queries and its practical application in building natural language interfaces to database systems. The major challenges in text-to-SQL lie in encoding the meaning of natural utterances, decoding to SQL queries, and translating the semantics between these two forms. These challenges have been addressed to different extents by the recent advances. However, there is still a lack of comprehensive surveys for this task. To this end, we review recent progress on text-to-SQL for datasets, methods, and evaluation and provide this systematic survey, addressing the aforementioned challenges and discussing potential future directions. We hope that this survey can serve as quick access to existing work and motivate future research.
Computation and Language
What problem does this paper attempt to address?
The problems that this paper attempts to solve are several major challenges in the Text - to - SQL task, which include: 1. **Encoding of natural language expressions**: How to effectively extract the meaning of natural language expressions, which is the basis for constructing a Text - to - SQL system. This process requires understanding the user's natural language query and transforming it into a form that can be processed by the computer. 2. **Semantic transformation from natural language to SQL**: How to transform the extracted natural language meaning into another expression form that is pragmatically equivalent to the meaning of natural language. This involves the semantic mapping problem between natural language and SQL. 3. **Decoding of SQL queries**: How to generate the corresponding SQL query to achieve an effective query of the database. This step requires the system to be able to construct the correct SQL statement according to the user's natural language query. In addition, the paper also points out that although some progress has been made in the above challenges in recent years, there is still a lack of a comprehensive review of these progresses. Therefore, the authors aim to provide a systematic review covering aspects such as data sets, methods, and evaluation, to help researchers quickly understand existing work and inspire future research directions. This review not only summarizes the existing technical achievements, but also discusses potential future development directions, providing guidance and support for further research in this field.