Information Networks Based Multi-semantic Data Embedding for Entity Resolution.

Chenchen Sun,Derong Shen,Tiezheng Nie
DOI: https://doi.org/10.1007/978-3-031-00129-1_2
2022-01-01
Abstract:Entity resolution (ER) is an ongoing topic in data integration and data governance, which attracts considerable attention from multiple research fields. Recently, deep learning techniques have been substantially applied to entity resolution. We focus on entity resolution with graph based multi-semantic data embedding. In ER, data with attributes cannot be well represented by common word embeddings from natural language processing. In this work, data with attributes are modeled as a family of multitype bipartite information networks, each of which captures a specific type of semantics in data. Based on this, multi-semantic embeddings of data are collectively learned through the family of information networks. Particularly, a novel method is introduced to learn similarity based bipartite network embeddings. Generated tailored data embeddings are put into a flexible hierarchical ER framework, which outputs ER classification distributions. Our approach is comprehensively evaluated on a group of datasets, which presents its effectiveness.
What problem does this paper attempt to address?