GNEM: A Generic One-to-Set Neural Entity Matching Framework

Runjin Chen,Yanyan Shen,Dongxiang Zhang
DOI: https://doi.org/10.1145/3442381.3450119
2021-01-01
Abstract:Entity Matching is a classic research problem in any data analytics pipeline, aiming to identify records referring to the same real-world entity. It plays an important role in data cleansing and integration. Advanced entity matching techniques focus on extracting syntactic or semantic features from record pairs via complex neural architectures or pre-trained language models. However, the performances always suffer from noisy or missing attribute values in the records. We observe that comparing one record with several relevant records in a collective manner allows each pairwise matching decision to be made by borrowing valuable insights from other pairs, which is beneficial to the overall matching performance. In this paper, we propose a generic one-to-set neural framework named GNEM for entity matching. GNEM predicts matching labels between one record and a set of relevant records simultaneously. It constructs a record pair graph with weighted edges and adopts the graph neural network to propagate information among pairs. We further show that GNEM can be interpreted as an extension and generalization of the existing pairwise matching techniques. Extensive experiments on real-world data sets demonstrate that GNEM consistently outperforms the existing pairwise entity matching techniques and achieves up to 8.4% improvement on F1-Score compared with the state-of-the-art neural methods.
What problem does this paper attempt to address?