Rule-Based Entity Resolution on Database with Hidden Temporal Information.
Hongzhi Wang,Xiaoou Ding,Jianzhong Li,Hong Gao
DOI: https://doi.org/10.1109/tkde.2018.2816018
IF: 9.235
2018-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:In this paper, we deal with the problem of rule-based entity resolution on imprecise temporal data. Entity resolution (ER) is widely explored in research community, but the problem on temporal data, especially without available timestamps, has not been studied well yet. Because of the elapsing of time, records referring to the same entity observed in different time periods may be different. Besides traditional similarity-based ER approaches, by carefully exploring several data quality rules, e.g., matching dependency and data currency, much information can be obtained to facilitate to cope with this problem. In this paper, we use such rules to derive temporal records’ information of time order and trend of their attributes’ evolvement with elapsing of time. Specifically, we first block records into smaller blocks, and then by exploring data currency constraints, we propose a temporal clustering approach with two steps, i.e., the skeleton clustering and the banding clustering. Experimental results on both real and synthetic data show that our entity resolution method can achieve both high accuracy and efficiency on datasets with hidden temporal information.