SiriusBI: Building End-to-End Business Intelligence Enhanced by Large Language Models

Jie Jiang,Haining Xie,Yu Shen,Zihan Zhang,Meng Lei,Yifeng Zheng,Yide Fang,Chunyou Li,Danqing Huang,Wentao Zhang,Yang Li,Xiaofeng Yang,Bin Cui,Peng Chen
2024-11-09
Abstract:The rapid advancement of AI technologies, particularly Large Language Models (LLMs), is establishing a new paradigm for Business Intelligence (BI). Despite the emergence of pioneering work in enhancing BI systems with LLMs, we have identified the following three issues when deployed in real industrial scenarios: interaction limitations, performance bottlenecks, and functionality deficiencies. In this paper, we present SiriusBI, an end-to-end business intelligence system that is designed to address the three issues simultaneously. First, we propose an intelligent and application-oriented module called multi-round dialogue with querying, which aims to overcome the prevalent interaction limitations in current BI solutions. Next, to mitigate the performance bottlenecks caused by scenario migration, we introduce two SQL generation methods that strike a balance between accuracy and deployment costs. Finally, to tackle the practical challenges posed by functionality deficiencies, we develop an end-to-end workflow that covers the entire BI process, ensuring that SiriusBI delivers a robust and complete set of functionalities. As an independent cloud service in Tencent's data platform, SiriusBI has been applied across Tencent's finance, advertising, and cloud sectors, providing services to dozens of enterprise clients. Experiments on real-world datasets and practical applications in industrial BI scenarios demonstrate the practicality and effectiveness of SiriusBI. Remarkably, SiriusBI achieves remarkable accuracy rates of 97% in SQL generation for Tencent Finance, 89% for Tencent Advertisement, and 91% for Tencent Cloud.
Databases
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper "SiriusBI: Building an End - to - End Business Intelligence System Based on Large Language Models" aims to solve three main problems encountered when deploying enhanced Business Intelligence (BI) systems in actual industrial scenarios: 1. **Interaction Limitations**: - Although current ChatBI systems support multi - round dialogue (MRD), there are still limitations in practical applications. Most research focuses on single - round dialogue (SRD), which restricts the ways users can express their needs and reduces the system's usability and user experience. - Existing solutions can usually only handle fixed question - and - answer formats and cannot adapt to BI scenarios that require in - depth domain knowledge. 2. **Performance Bottlenecks**: - In different application scenarios, existing NL2SQL methods often perform poorly. For example, a large - language model (LLM) fine - tuned on financial - domain data may perform poorly in the advertising domain. - Preparing and standardizing data for each professional domain and then fine - tuning it incurs high costs, which are usually impractical in industrial applications. 3. **Functionality Deficiencies**: - Traditional BI systems are usually comprehensive data - analysis solutions that cover multiple interconnected modules, such as data storage, SQL generation, and data analysis. However, existing LLM - based BI systems are deficient in providing key functions, including efficient intention understanding, domain - specific knowledge management, and intelligent data analysis. - Relying solely on NL2SQL tasks or being limited to multi - round dialogue functions in specific scenarios cannot meet the needs of actual BI applications. ### Solutions To address the above challenges, the paper proposes SiriusBI, an end - to - end BI system that utilizes the capabilities of large - language models (LLM) to enhance each sub - module, thereby improving the efficiency of data analysis and user experience. The specific solutions are as follows: 1. **Multi - Round Dialogue with Querying (MRD - Q)**: - An intention - clarification module is introduced. By asking follow - up questions to eliminate users' ambiguous queries, the system can accurately identify users' true intentions, even if the initial query is incorrect or ambiguous. - The Retrieval - Augmented Generation (RAG) framework is used, combining business - domain knowledge and metadata to ensure the system's adaptability in different domains. 2. **SQL Generation Strategies**: - For domain - specific NL2SQL tasks, an efficient and automated training - data construction process is proposed to improve the performance of LLM. - For scenarios that require migration, a two - step method is proposed, combining semantic models and SQL models, using domain - specific knowledge and metadata to ensure high - precision SQL generation while reducing the migration costs between different scenarios. 3. **End - to - End Workflow**: - A comprehensive end - to - end workflow is provided, including a knowledge - management module and a data - insight module. - The knowledge - management module is responsible for storing, processing, and utilizing domain - specific professional knowledge and data, supporting the special knowledge requirements of each sub - module. - The data - insight module not only provides traditional data visualization and report generation but also provides key functions such as automatic data preparation, complex - task planning, and insight - tool application, providing users with more comprehensive and in - depth data - analysis support. ### Experimental Results The experimental results show the effectiveness and practicality of SiriusBI in actual industrial BI scenarios. Specifically, SiriusBI achieves 97%, 89%, and 91% SQL - generation accuracy in Tencent's finance, advertising, and cloud - service domains respectively. Through these innovations, SiriusBI successfully solves the deficiencies of existing BI systems in terms of interaction, performance, and functionality, providing users with a more efficient and user - friendly data - analysis tool.