Abstract:Large Language Models (LLMs) have shown useful applications in a variety of tasks, including data wrangling. In this paper, we investigate the use of an off-the-shelf LLM for schema matching. Our objective is to identify semantic correspondences between elements of two relational schemas using only names and descriptions. Using a newly created benchmark from the health domain, we propose different so-called task scopes. These are methods for prompting the LLM to do schema matching, which vary in the amount of context information contained in the prompt. Using these task scopes we compare LLM-based schema matching against a string similarity baseline, investigating matching quality, verification effort, decisiveness, and complementarity of the approaches. We find that matching quality suffers from a lack of context information, but also from providing too much context information. In general, using newer LLM versions increases decisiveness. We identify task scopes that have acceptable verification effort and succeed in identifying a significant number of true semantic matches. Our study shows that LLMs have potential in bootstrapping the schema matching process and are able to assist data engineers in speeding up this task solely based on schema element names and descriptions without the need for data instances.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper explores how to utilize existing large-scale language models (LLM) for schema matching. Specifically, the research aims to identify semantic correspondences between two relational schemas using only the names and descriptions of schema elements. The researchers created a new benchmark dataset derived from schemas in the healthcare domain and proposed different task scopes that include varying degrees of contextual information when prompting the LLM for schema matching. ### Main Research Questions 1. **How does the quality of schema matching vary between different task scopes and LLM models? How does it compare to baseline methods based on string similarity?** 2. **How decisive is the LLM when expressing opinions on attribute pairs? What is the impact on its reliability and consistency?** 3. **What is the complementarity between the matching results of different task scopes and baseline methods?** 4. **Is it practical and useful to combine different LLM-based or string similarity baseline matchings?** ### Methods - **Schema Matching Definition**: Schema matching refers to deriving a set of valid 1:1 matches from two given schemas. - **Benchmark Dataset**: Researchers extracted source and target schemas from the MIMIC-IV dataset and the OHDSI OMOP common data model, manually identifying all semantically valid 1:1 matches as the benchmark. - **Prompt Engineering**: Four task scopes (1-to-1, 1-to-N, N-to-1, N-to-M) were designed, each including different degrees of schema information in the prompts. - **Experimental Setup**: For each dataset and task scope, the experiment involved sending the same prompt to the LLM three times and using majority voting to reduce the impact of hallucinations. The results were evaluated using F1 scores and decisiveness scores. ### Results - **F1 Scores**: Most task scopes (except 1-to-1) outperformed the string similarity baseline across multiple datasets. Notably, the N-to-M task scope performed best on GPT-4, achieving the highest average F1 score. - **Decisiveness**: Increasing contextual information improved the LLM's decisiveness, thereby enhancing matching quality. - **Complementarity**: The matching results of different task scopes showed a certain degree of complementarity, and combining multiple task scopes could further improve matching performance. ### Conclusion The study demonstrates that LLMs have potential in schema matching tasks, especially when only schema element names and descriptions are available. By reasonably designing task scopes, matching quality and decisiveness can be significantly improved. Future research can explore how to combine different task scopes and baseline methods to further optimize schema matching performance.

Schema Matching with Large Language Models: an Experimental Study

ReMatch: Retrieval Enhanced Schema Matching with LLMs

Matchmaker: Self-Improving Large Language Model Programs for Schema Matching

Magneto: Combining Small and Large Language Models for Schema Matching

Entity Matching using Large Language Models

KcMF: A Knowledge-compliant Framework for Schema and Entity Matching with Fine-tuning-free LLMs

Schema Matching using Machine Learning

Matching Table Metadata with Business Glossaries Using Large Language Models

GRAM: Generative Retrieval Augmented Matching of Data Schemas in the Context of Data Security

LLMs4OM: Matching Ontologies with Large Language Models

Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching

Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

Assessing SPARQL capabilities of Large Language Models

An Interdisciplinary Outlook on Large Language Models for Scientific Research

Testing the use of a large language model (LLM) for performing data quality assessment

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field

Large Language Models in Healthcare: A Comprehensive Benchmark

Querying Large Language Models with SQL

Disambiguate Entity Matching using Large Language Models through Relation Discovery

Schema-Driven Information Extraction from Heterogeneous Tables