Hybrid Querying Over Relational Databases and Large Language Models

Fuheng Zhao,Divyakant Agrawal,Amr El Abbadi

2024-08-02

Abstract:Database queries traditionally operate under the closed-world assumption, providing no answers to questions that require information beyond the data stored in the database. Hybrid querying using SQL offers an alternative by integrating relational databases with large language models (LLMs) to answer beyond-database questions. In this paper, we present the first cross-domain benchmark, SWAN, containing 120 beyond-database questions over four real-world databases. To leverage state-of-the-art language models in addressing these complex questions in SWAN, we present, HQDL, a preliminary solution for hybrid querying, and also discuss potential future directions. Our evaluation demonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0\% in execution accuracy and 48.2\% in data factuality. These results highlights both the potential and challenges for hybrid querying. We believe that our work will inspire further research in creating more efficient and accurate data systems that seamlessly integrate relational databases and large language models to address beyond-database questions.

Databases,Computation and Language

What problem does this paper attempt to address?

The paper primarily focuses on addressing how to leverage large language models (LLMs) in conjunction with relational databases to answer questions that go beyond the information stored within the database itself (referred to as "beyond-database" questions). Traditionally, database queries are conducted under the closed-world assumption, meaning they provide answers based solely on the data stored in the database. However, in many cases, users may need to obtain answers based on information both inside and outside the database. To achieve this goal, the authors propose the following points: 1. **SWAN Benchmark**: This is the first cross-domain benchmark set, containing 120 "beyond-database" questions for four real-world databases. These databases cover different domains, such as European football, Formula 1 racing, etc. 2. **HQDL Solution**: This is an initial approach to solving the complex questions mentioned above by integrating large language models with relational databases. HQDL includes the processes of data generation, extraction, and ultimately executing hybrid queries to answer user questions. 3. **Evaluation Results**: The authors evaluated using state-of-the-art language models like GPT-4 Turbo and reported results in terms of execution accuracy and data factuality. The results indicate that despite challenges, this approach shows potential. Specifically, HQDL uses zero-shot and few-shot prompts to guide large language models in generating the required data. Experimental results show that when more examples are provided, the scores for execution accuracy and data factuality improve. Additionally, the paper discusses some limitations of HQDL and suggests directions for future improvements. In summary, this paper explores how to combine large language models with relational databases to address questions that require information from both inside and outside the database. It proposes an initial solution and a benchmark set to promote further research in this field.

Hybrid Querying Over Relational Databases and Large Language Models

Querying Large Language Models with SQL

A Hybrid Approach to DBQA

A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases

BlendSQL: A Scalable Dialect for Unifying Hybrid Question Answering in Relational Algebra

Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open Domain Question Answering

Revolutionizing Database Q&A with Large Language Models: Comprehensive Benchmark and Evaluation

Evaluating SQL Understanding in Large Language Models

On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

Relational Database Augmented Large Language Model

Towards Accurate and Efficient Document Analytics with Large Language Models

DB-GPT: Large Language Model Meets Database

Aligning Large Language Models to a Domain-specific Graph Database for NL2GQL

DB-GPT: Empowering Database Interactions with Private Large Language Models

A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?

A Survey on Employing Large Language Models for Text-to-SQL Tasks

CHESS: Contextual Harnessing for Efficient SQL Synthesis

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

Domain-specific Question Answering with Hybrid Search

SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models

Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation