Reserch of Entity Matching Based on Multiple Heterogenous Data
Lingyang WANG,Qinkuang CHEN,Lidan SHOU,Ke CHEN
DOI: https://doi.org/10.3778/j.issn.1002-8331.1807-0153
2019-01-01
Abstract:In recent years, for the entity matching problem of multi-source heterogeneous data, many scholars have proposed different solutions. However, these methods usually focus on entity matching under semantic frameworks such as RDFS or OWL. In addition, when facing multiple data source entity matching problem, most current methods will regard it as a two data source matching problem. These methods not only have high computational complexity, but also do not analyze the entity data from multiple aspects. To address this issue, the paper proposes an entity matching method which uses the commonly existing names, attributes, and context information of entities to construct multiple indexes, which can reduce the space complexity and generate high-quality candidate sets. This paper also proposes a method for calculating the similarity of entities, which effectively determining whether entity pair matches. According to the weights and mutual exclusion relations between entities, it proposes an optimization algorithm based on graph division and divides equivalent entities into the same set. Experiments are conducted on real-world datasets of brand and character categories in the business domain, and the experimental results show that this method can achieve good improvements.