GAR Plus Plus : Natural Language to SQL Translation with Efficient Generate-and-Rank

Yuankai Fan,Can Huang,Tonghui Ren,Zhenying He,X. Sean Wang,Xianglian Wu,Yue Wang,Jiaming Li,Yifan Yang
DOI: https://doi.org/10.1007/978-981-97-7238-4_26
2024-01-01
Abstract:Web applications heavily depend on databases, yet the conventional database interface often presents challenges for efficient data utilization. It is imperative to address the considerable demand emanating from a vast array of end users seeking seamless input of their requirements and effortless retrieval of query results. Natural Language (NL) Interfaces to Databases serve to make databases accessible to end users. Mainstream approaches typically prioritize building language translation models for converting NL queries to SQL queries, while a novel generate-and-rank approach is proposed to achieve this through a procedure involving generation and ranking. Despite yielding superior translation results on the public benchmark, this generate-and-rank approach encounters efficiency issues that may impede its practical application. In this paper, we introduce Gar++, which extends the existing generateand-rank approach for a more efficient generation and robust ranking procedure. Specifically, Gar++ utilizes the bloom filter to accelerate the data generation process by reducing unnecessary function calls. Additionally, Gar++ provides a brand-new implementation of the ranking module, specifically the re-ranking model, empowered with enhanced language understanding ability. We evaluate the effectiveness of Gar++ on three public benchmarks, namely GEO, Spider, and Mt-teql. Gar++ achieved an overall accuracy of 66.6% on Geo, 80.6% on Spider, and 78.4% on Mt-teql, respectively.
What problem does this paper attempt to address?