Abstract:In banks, governments, and Internet companies, inconsistent data problems may often arise when various information systems are collecting, processing, and updating data due to human or equipment reasons. The emergence of inconsistent data makes it impossible to obtain correct information from the data and reduces its availability. Such problems may be fatal in data-intensive enterprises, which causes huge economic losses. Moreover, it is very difficult to clean inconsistent data in databases, especially for data containing conditional functional dependencies with built-in predicates (CFDPs), because it tends to contain more candidate repair values. For the inconsistent data containing CFDPs to detect incomplete and repair difficult problems in databases, we propose a dependency lifting algorithm (DLA) based on the maximum dependency set (MDS) and a reparation algorithm (C-Repair) based on integrating the minimum cost and attribute correlation, respectively. In detection, we find recessive dependencies from the original dependency set to obtain the MDS and improve the original algorithm by dynamic domain adjustment, which extends the applicability to continuous attributes and improves the detection accuracy. In reparation, we first set up a priority queue (PQ) for elements to be repaired based on the minimum cost idea to select a candidate element; then, we treat the corresponding conflict-free instance ( I n v ) as the training set to learn the correlation among attributes and compute the weighted distance (WDis) between the tuple of the candidate element and other tuples in I n v according to the correlation; and, lastly, we perform reparation based on the WDis and re-compute the PQ after each reparation round to improve the efficiency, and use a label, flag, to mark the repaired elements to ensure the convergence at the same time. By setting up a contrast experiment, we compare the DLA with the CFDPs based algorithm, and the C-Repair with a cost-based, interpolation-based algorithm on a simulated instance and a real instance. From the experimental results, the DLA and C-Repair algorithms have better detection and repair ability at a higher time cost.

Automatic Data Repair: Are We Ready to Deploy?

Automatic Data Repair: Are We Ready to Deploy?

The Future Can’t Help Fix the Past: Assessing Program Repair in the Wild

Assessing Data Quality Within Available Context

On Reliability of Patch Correctness Assessment.

Web-ADARE: A Web-Aided Data Repairing System

Automatic Weighted Matching Rectifying Rule Discovery for Data Repairing

CrowdAidRepair: A Crowd-Aided Interactive Data Repairing Method.

Repair Diversification: A New Approach for Data Repairing

Data repair of density-based data cleaning approach using conditional functional dependencies

DeepRepair: Style-Guided Repairing for Deep Neural Networks in the Real-World Operational Environment

Pattern-Driven Data Cleaning

A Novel Cost-Based Model for Data Repairing

Repairing Deep Neural Networks: Fix Patterns and Challenges

Towards Explainable Automated Data Quality Enhancement without Domain Knowledge

Inconsistent Data Cleaning Based on the Maximum Dependency Set and Attribute Correlation

AUTOMATED EVALUATION AND RATING OF PRODUCT REPAIRABILITY USING ARTIFICIAL INTELLIGENCE-BASED APPROACHES

A Survey of Learning-based Automated Program Repair

DeepRepair: Style-Guided Repairing for DNNs in the Real-world Operational Environment

Fast Automated Abstract Machine Repair Using Simultaneous Modifications and Refactoring.

Enabling Automatic Repair of Source Code Vulnerabilities Using Data-Driven Methods