In Situ Neural Relational Schema Matcher

Xingyu Du,Gongsheng Yuan,Sai Wu,Gang Chen,Peng Lu
DOI: https://doi.org/10.1109/icde60146.2024.00018
2024-01-01
Abstract:The scarcity of training data restricts a neural network from capturing schema diversity and intricacies, hindering schema-matching models' generalization capabilities. In this paper, we propose ISResMat, a framework specifically designed to match the schemas of relational tables by fine-tuning a pre-trained language model. We first offer a training data construction method, Pairwise Sampling, which could generate the training dataset with table data. Next, we design two loss functions (i.e., Meta-Matching Loss and Agent-Delegating Loss) to learn representations of table columns. With those representations, we could calculate matching scores between different table columns for deducing the matching candidates, which provides a novel approach to schema matching. Finally, we present two optimizations (i.e., Matching Rectification Loss and Distribution-Aware Fingerprint) to handle the problems of matching cardinality constraints and numerical columns, respectively. ISResMat is a flexible framework supporting instance-based, schema-based, and hybrid matching without significant modification. Experiments on 500+ fabricated and human-curated relation pairs spanning diverse domains and matching scenarios showcase that our approach outperforms existing state-of-the-art methods.
What problem does this paper attempt to address?