A temporal record matching based on entity evolution
Hong Liu,Derong Shen,Yue Kou,Tiezheng Nie,Ge Yu
DOI: https://doi.org/10.13232/j.cnki.jnju.2017.06.001
2017-01-01
Abstract:Entity resolution,also named as record linkage,is to j udge whether two different records in one or more data sources belong to the same entity.In the area of data integration,entity resolution is widely used for data clean, deduplication and similarity joins.Entity resolution can be also widely applied in census,citation recognition,web search,data cleaning,plagiarism and inspection.However,in reality,entity attribute changes over time.That is,the two records with different attributes do not mean the two records belong to different entity.On the contrary,the two records with the same attributes also can not demonstrate the reference to the same entity.Then,the problem of linking temporal record,which aims at linking the records with time stamps,is proposed.Most state-of-the-art methods prefer to present different temporal models to capture the entity evolution.However,these temporal models have a low accuracy and a high computation cost in solving temporal record linkage.In this paper,we firstly present a more novel temporal model for capturing entity evolution.Then,a two-stage fast clustering algorithm are presented.At last,experimental results on three real-world datasets demonstrate that our temporal model has better performance in capturing the entity evolution,and our clustering algorithm is more fast and accurate in solving temporal record linkage.