Repairing Functional Dependency Violations In Distributed Data

Qing Chen,Zijing Tan,Chu He,Chaofeng Sha,Wei Wang
DOI: https://doi.org/10.1007/978-3-319-18120-2_26
2015-01-01
Abstract:One of the problems central to data consistency is data repairing. Given a database D violating a set Sigma of data dependencies as data quality rules, it aims to modify D for a new relation D' satisfying Sigma. When D is a centralized database, a host of methods have been provided to address this problem. In practice, a database may be fragmented and distributed to multiple sites, which is advocated by distributed systems for better scalability and is readily supported by commercial systems. This paper makes a first effort to develop techniques for repairing functional dependency violations in a horizontally partitioned database. (1) Based on a message-passing distributed computing model and two complexity measures (parallel time and data shipment) for distributed algorithms, we study data repairing with equivalence classes in the distributed setting. We show that it is NP-complete to build equivalence classes when the data is horizontally partitioned, and when we aim to minimize either data shipment or parallel computation time. (2) Despite the intractability, we propose efficient distributed algorithms and optimization techniques for data repairing based on equivalence classes. (3) We experimentally verify the effectiveness and efficiency of our algorithms, using both real-life and synthetic data.
What problem does this paper attempt to address?