Schema Matching with Large Language Models: an Experimental Study

Marcel Parciak,Brecht Vandevoort,Frank Neven,Liesbet M. Peeters,Stijn Vansummeren
2024-07-16
Abstract:Large Language Models (LLMs) have shown useful applications in a variety of tasks, including data wrangling. In this paper, we investigate the use of an off-the-shelf LLM for schema matching. Our objective is to identify semantic correspondences between elements of two relational schemas using only names and descriptions. Using a newly created benchmark from the health domain, we propose different so-called task scopes. These are methods for prompting the LLM to do schema matching, which vary in the amount of context information contained in the prompt. Using these task scopes we compare LLM-based schema matching against a string similarity baseline, investigating matching quality, verification effort, decisiveness, and complementarity of the approaches. We find that matching quality suffers from a lack of context information, but also from providing too much context information. In general, using newer LLM versions increases decisiveness. We identify task scopes that have acceptable verification effort and succeed in identifying a significant number of true semantic matches. Our study shows that LLMs have potential in bootstrapping the schema matching process and are able to assist data engineers in speeding up this task solely based on schema element names and descriptions without the need for data instances.
Databases,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper explores how to utilize existing large-scale language models (LLM) for schema matching. Specifically, the research aims to identify semantic correspondences between two relational schemas using only the names and descriptions of schema elements. The researchers created a new benchmark dataset derived from schemas in the healthcare domain and proposed different task scopes that include varying degrees of contextual information when prompting the LLM for schema matching. ### Main Research Questions 1. **How does the quality of schema matching vary between different task scopes and LLM models? How does it compare to baseline methods based on string similarity?** 2. **How decisive is the LLM when expressing opinions on attribute pairs? What is the impact on its reliability and consistency?** 3. **What is the complementarity between the matching results of different task scopes and baseline methods?** 4. **Is it practical and useful to combine different LLM-based or string similarity baseline matchings?** ### Methods - **Schema Matching Definition**: Schema matching refers to deriving a set of valid 1:1 matches from two given schemas. - **Benchmark Dataset**: Researchers extracted source and target schemas from the MIMIC-IV dataset and the OHDSI OMOP common data model, manually identifying all semantically valid 1:1 matches as the benchmark. - **Prompt Engineering**: Four task scopes (1-to-1, 1-to-N, N-to-1, N-to-M) were designed, each including different degrees of schema information in the prompts. - **Experimental Setup**: For each dataset and task scope, the experiment involved sending the same prompt to the LLM three times and using majority voting to reduce the impact of hallucinations. The results were evaluated using F1 scores and decisiveness scores. ### Results - **F1 Scores**: Most task scopes (except 1-to-1) outperformed the string similarity baseline across multiple datasets. Notably, the N-to-M task scope performed best on GPT-4, achieving the highest average F1 score. - **Decisiveness**: Increasing contextual information improved the LLM's decisiveness, thereby enhancing matching quality. - **Complementarity**: The matching results of different task scopes showed a certain degree of complementarity, and combining multiple task scopes could further improve matching performance. ### Conclusion The study demonstrates that LLMs have potential in schema matching tasks, especially when only schema element names and descriptions are available. By reasonably designing task scopes, matching quality and decisiveness can be significantly improved. Future research can explore how to combine different task scopes and baseline methods to further optimize schema matching performance.