ChatBI: Towards Natural Language to Complex Business Intelligence SQL

Jinqing Lian,Xinyi Liu,Yingxia Shao,Yang Dong,Ming Wang,Zhang Wei,Tianqi Wan,Ming Dong,Hailin Yan
2024-05-01
Abstract:The Natural Language to SQL (NL2SQL) technology provides non-expert users who are unfamiliar with databases the opportunity to use SQL for data analysis.Converting Natural Language to Business Intelligence (NL2BI) is a popular practical scenario for NL2SQL in actual production systems. Compared to NL2SQL, NL2BI introduces more challenges.
Databases
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper attempts to address several key challenges in the task of converting natural language to complex business intelligence SQL (NL2BI). Specifically, the paper focuses on the following three main issues: 1. **Multi-turn Dialogue Matching**: - **Problem Description**: In actual production systems, user interactions with NL2BI tools are typically multi-turn dialogues (MRD), whereas existing NL2SQL methods mainly handle single-turn dialogues (SRD). This leads to existing methods being unable to effectively recognize and handle intent changes in multi-turn dialogues. - **Solution**: The paper proposes a combined approach of classification and prediction tasks, using pre-trained models (such as ERNIE) to determine whether the current query contains dimensions and columns, thereby identifying if it is part of a multi-turn dialogue. 2. **Large Number of Columns and Ambiguous Columns**: - **Problem Description**: In business intelligence (BI) scenarios, data tables usually contain a large number of columns, and many of these columns are ambiguous (i.e., have multiple different meanings). This poses a challenge for existing methods that rely on large language models (LLM) for schema linking, as the number of columns exceeds the token limit of the models. - **Solution**: The paper introduces the view technology from the database community, transforming the schema linking problem into a single view selection problem. By using smaller and more cost-effective machine learning models to select a single view, the number of columns is reduced, thereby addressing the issues of too many columns and ambiguous columns. 3. **Deficiencies of Existing Processes**: - **Problem Description**: Existing processes mainly rely on advanced LLMs to directly generate SQL, but these models perform poorly in handling complex semantics, comparative relationships, and computational relationships, especially in BI scenarios. - **Solution**: The paper designs a phased process, introducing the concept of virtual columns. By separating the handling of complex semantics, computations, and comparative relationships from the LLM, and using rule-based methods to generate the final SQL, the accuracy and efficiency of SQL generation are improved. In summary, the paper aims to improve the ability of non-expert users to use SQL for data analysis in actual production systems by proposing the ChatBI system, which addresses multi-turn dialogue matching, handling of a large number of columns and ambiguous columns, and deficiencies of existing processes in the NL2BI task.