NL2KQL: From Natural Language to Kusto Query

Amir H. Abdi,Xinye Tang,Jeremias Eichelbaum,Mahan Das,Alex Klein,Nihal Irmak Pakis,William Blum,Daniel L Mace,Tanvi Raja,Namrata Padmanabhan,Ye Xing

2024-04-16

Abstract:Data is growing rapidly in volume and complexity. Proficiency in database query languages is pivotal for crafting effective queries. As coding assistants become more prevalent, there is significant opportunity to enhance database query languages. The Kusto Query Language (KQL) is a widely used query language for large semi-structured data such as logs, telemetries, and time-series for big data analytics platforms. This paper introduces NL2KQL an innovative framework that uses large language models (LLMs) to convert natural language queries (NLQs) to KQL queries. The proposed NL2KQL framework includes several key components: Schema Refiner which narrows down the schema to its most pertinent elements; the Few-shot Selector which dynamically selects relevant examples from a few-shot dataset; and the Query Refiner which repairs syntactic and semantic errors in KQL queries. Additionally, this study outlines a method for generating large datasets of synthetic NLQ-KQL pairs which are valid within a specific database contexts. To validate NL2KQL's performance, we utilize an array of online (based on query execution) and offline (based on query parsing) metrics. Through ablation studies, the significance of each framework component is examined, and the datasets used for benchmarking are made publicly available. This work is the first of its kind and is compared with available baselines to demonstrate its effectiveness.

Databases,Artificial Intelligence,Computation and Language

What problem does this paper attempt to address?

The paper aims to address the problem of converting Natural Language Queries (NLQ) into Kusto Query Language (KQL). Specifically, the research team proposes an innovative framework called NL2KQL, which leverages Large Language Models (LLMs) to achieve this conversion process. The NL2KQL framework includes several key components: 1. **Schema Refiner**: This component is responsible for filtering out the most relevant elements from the database schema to reduce potential errors during the generation process. 2. **Few-shot Selector**: Dynamically selects a small number of example data relevant to the current task, which helps the model better understand the query requirements in a specific context. 3. **Query Refiner**: Responsible for fixing syntactic and semantic errors in the generated KQL queries, ensuring their validity in the target Kusto database. Additionally, the paper introduces a method for generating a large number of synthetic NLQ-KQL pairs that are valid in a specific database environment. To validate the effectiveness of NL2KQL, the research team employed a series of online and offline metrics and conducted ablation studies to assess the importance of each component. Ultimately, the research team released the first benchmark test set for KQL generation and the related data catalog for other researchers to reference and use. Overall, the goal of this research is to lower the technical barrier, enabling more users to easily interact with data through natural language, especially in scenarios dealing with semi-structured big data such as logs, telemetry data, and time-series data. In this way, NL2KQL is expected to enhance the efficiency and accessibility of data analysis.

NL2KQL: From Natural Language to Kusto Query

$R^3$-NL2GQL: A Model Coordination and Knowledge Graph Alignment Approach for NL2GQL

Quda: Natural Language Queries for Visual Data Analytics

The Dawn of Natural Language to SQL: Are We Fully Ready?

CatSQL: Towards Real World Natural Language to SQL Applications.

A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?

Semantic Parsing for Complex Data Retrieval: Targeting Query Plans vs. SQL for No-Code Access to Relational Databases

Aligning Large Language Models to a Domain-specific Graph Database for NL2GQL

Chatting with Logs: An exploratory study on Finetuning LLMs for LogQL

Efficient Deployment of Conversational Natural Language Interfaces over Databases

Blar-SQL: Faster, Stronger, Smaller NL2SQL

Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation

Querying Large Language Models with SQL

UQE: A Query Engine for Unstructured Databases

Interactive Natural Language Question Answering over Knowledge Graphs

E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL

Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation

Analyzing Human Questioning Behavior and Causal Curiosity through Natural Queries

Querying Knowledge via Multi-Hop English Questions