Towards deep entity resolution via soft schema matching

Chenchen Sun,Derong Shen
DOI: https://doi.org/10.1016/j.neucom.2021.10.106
IF: 6
2022-01-01
Neurocomputing
Abstract:Entity resolution (ER) leads a key role in data preprocessing. ER identifies records corresponding to the same real-world entity. Recent years have witnessed a growing trend of deep learning based ER (deep ER). However, previous deep ER works do not fully utilize schema semantics, since they either use hard schema matching or disregard schema matching. In this work, we flexibly exploit schema matching to enhance deep ER. We define and implement soft schema matching, where attributes are flexibly associated in probabilities. Attribute associations are generated by aggregating token connections in coarse deep ER. Then we incorporate soft schema matching into hierarchical attention networks for ER, which tremendously improves resolution quality, especially for complex data and corrupted data. Different attentions are utilized for particular sub-tasks in ER networks, such as self-attention for contextualization, inter-attention for alignment and intra-attention for weighting. Finally comprehensive experiments are run over common data, complex data and corrupted data. Evaluation results show that our approach surpasses previous works.
computer science, artificial intelligence
What problem does this paper attempt to address?