Cleaning Relations Using Knowledge Bases.

Shuang Hao,Nan Tang,Guoliang Li,Jian Li
DOI: https://doi.org/10.1109/icde.2017.141
2017-01-01
Abstract:We study the data cleaning problem of detecting and repairing wrong relational data, as well as marking correct data, using well curated knowledge bases (KBs). We propose detective rules (DRs), a new type of data cleaning rules that can make actionable decisions on relational data, by building connections between a relation and a KB. The main invention is that, a DR simultaneously models two opposite semantics of a relation using types and relationships in a KB: the positive semantics that explains how attribute values are linked to each other in correct tuples, and the negative semantics that indicates how wrong attribute values are connected to other correct attribute values within the same tuples. Naturally, a DR can mark correct values in a tuple if it matches the positive semantics. Meanwhile, a DR can detect/repair an error if it matches the negative semantics. We study fundamental problems associated with DRs, e.g., rule generation and rule consistency. We present efficient algorithms to apply DRs to clean a relation, based on rule order selection and inverted indexes. Extensive experiments, using both real-world and synthetic datasets, verify the effectiveness and efficiency of applying DRs in practice.
What problem does this paper attempt to address?