Swapping Repair for Misplaced Attribute Values

Yu Sun,Shaoxu Song,Chen Wang,Jianmin Wang
DOI: https://doi.org/10.1109/icde48307.2020.00068
2020-01-01
Abstract:Misplaced data in a tuple are prevalent, e.g., a value "Passport" is misplaced in the passenger-name attribute, which should belong to the travel-document attribute instead. While repairing in-attribute errors have been widely studied, i.e., to repair the error by other values in the attribute domain, misplacement errors are surprisingly untouched, where the true value is simply misplaced in some other attribute of the same tuple. For instance, the true passenger-name is indeed misplaced in the travel-document attribute of the record. In this sense, we need a novel swapping repair model (to swap the misplaced passenger-name and travel-document values "Passport" and "John Adam" in the same tuple). Determining a proper swapping repair, however, is non-trivial. The minimum change criterion, evaluating the distance between the swapping repaired values, is obviously meaningless, since they are from different attribute domains. Intuitively, one may examine whether the swapped value ("John Adam") is similar to other values in the corresponding attribute domain (passenger-name). In a holistic view of all (swapped) attributes, we propose to evaluate the likelihood of a swapping repaired tuple by studying its distances (similarity) to neighbors. The rationale of distance likelihood refers to the Poisson process of nearest neighbor appearance. The optimum repair problem is to find a swapping repair with the maximum likelihood on distances. Experiments over datasets with real-world misplaced attribute values demonstrate the effectiveness of our proposal in repairing misplacement.
What problem does this paper attempt to address?