Abstract:Large enterprise databases can be complex and messy, obscuring the data semantics needed for analytical tasks. We propose a semantic layer in-between the database and the user as a set of small and easy-to-interpret database views, effectively acting as a refined version of the schema. To discover these views, we introduce a multi-agent Large Language Model (LLM) simulation where LLM agents collaborate to iteratively define and refine views with minimal input. Our approach paves the way for LLM-powered exploration of unwieldy databases.
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: how to simplify the process of exploring and understanding complex databases by constructing a semantic layer (semantic layer), so that users (including those without a technical background) can interact with the database more easily and obtain valuable insights from it.
### Specific background of the problem
1. **Complexity of data exploration**:
- When obtaining insights into the content and structure of the database, users usually need to go through a cumbersome data exploration process.
- Database schemas are often very complex, containing a large number of tables and columns, which makes direct understanding and querying difficult.
2. **Limitations of existing solutions**:
- Current Text - to - SQL solutions allow non - technical personnel to query databases in natural language, but these solutions have many problems, such as data and query ambiguity, schema complexity, etc., so they are not perfect.
3. **Role of the semantic layer**:
- Traditionally, knowledge engineers explicitly construct a semantic layer between the database and the user to abstract away certain details and provide clearer data semantics.
- In the era of large - language models (LLMs), researchers try to use LLM - driven natural - language interfaces to bypass this expensive process, but still face challenges.
### Methods proposed in the paper
The paper makes two main contributions to solve the above problems:
1. **Defining the semantic layer as a set of streamlined views**:
- It is proposed to define the semantic layer as a set of database views that are easy to interpret and reuse. Each view is a virtual table and can be used for subsequent queries like an ordinary table.
- By discovering views representing entities, their attributes, and the relationships between them, the semantic layer enhances the existing difficult - to - interpret database schema, making it easier to understand and use.
2. **Schema refinement based on agent programming**:
- Use a multi - agent system to automatically discover meaningful database views. These agents work collaboratively through iterative conversations and feedback loops to gradually optimize and validate the generated views.
- Use LLMs to inject external knowledge, guide the view discovery process, and ensure that the generated views are both accurate and useful.
### Experimental results
The paper shows the application effect of its method on multiple commercial databases, especially conducting a detailed analysis of the synthetic data provided by the Braze customer engagement platform. The results show that through multi - agent simulation, a large number of streamlined views are generated, which significantly reduces the width of the original tables and retains the key relationships, thus effectively decomposing the complex database schema.
### Summary
The main goal of the paper is to make the exploration and understanding of complex databases simpler and more efficient by constructing a semantic layer. Specifically, it solves the problems in traditional methods by defining a set of streamlined database views and using a multi - agent system to automatically discover and optimize these views.