Pyramid: A Heterogeneous Data Integration Algorithm Based on Hierarchical Graph.

Sining Jiang,Yujun Lan,Weigang Wang,Zhongwen Guo
DOI: https://doi.org/10.1109/ICASSP48485.2024.10447879
2024-01-01
Abstract:The surging volume of big data underscores the imperative of integrating heterogeneous datasets into a unified, semantically consistent format. We introduce Pyramid, a comprehensive framework for heterogeneous data integration, addressing schema transformation, feature encoding, entity matching, deduplication, and mapping retrieval. At its core, a hierarchical graph captures relationships across databases, bridging diverse data sources. We employ a bottom-up encoding strategy, factoring in data context, and a top-down matching mechanism, curbing attribute misalignment across entity types. Enhanced by the transformer model and contrastive learning, our approach realizes unsupervised feature synthesis, bolstering integration. Extensive experiments and evaluations validate the broad applicability and superior performance of our method across a variety of heterogeneous datasets.
What problem does this paper attempt to address?