SpCQL: A Semantic Parsing Dataset for Converting Natural Language into Cypher

Aibo Guo,Xinyi Li,Guanchen Xiao,Zhen Tan,Xiang Zhao
DOI: https://doi.org/10.1145/3511808.3557703
2022-01-01
Abstract:The Neo4j query language Cypher enables efficient querying for graphs and has become the most popular graph database language. Due to its complexities, semantic parsing (similar to Text-to-SQL) that translates natural language queries to Cypher becomes highly desirable. We propose the first Text-to-CQL dataset, SpCQL, which contains one Neo4j graph database, 10,000 manually annotated natural language queries and the matching Cypher queries (CQL). Correspondingly, based on this dataset, we define a new semantic parsing task Text-to-CQL. The Text-to-CQL task differs from the traditional Text-to-SQL task due to CQL being more flexible and versatile, especially for schema queries, which brings precedented challenges for the translation process. Although current SOTA Text-to-SQL models utilize SQL schema and contents, they do not scale up to large-scale graph databases. Besides, due to the absence of the primary and foreign keys in Cypher, which are essential for the multi-table Text-to-SQL task, existing Text-to-SQL models are rendered ineffective in this new task and have to be adapted to work. We propose three baselines based on the Seq2Seq framework and conduct experiments on the SpCQL dataset. The experiments yield undesirable results for existing models, hence pressing for subsequent research that considers the characteristics of SQL. The dataset is available at https://github.com/Guoaibo/Text-to-CQL.
What problem does this paper attempt to address?