Uncertain Knowledge Representation And Inference For Tracing Errors In Uncertain Data

Kun Yue,Weiyi Liu,Hao Wu,Dapeng Tao,Ming Gao
2018-01-01
Abstract:Data in probabilistic databases may not be absolutely correct, and worse, may be erroneous. Many existing data cleaning methods can be used to detect errors in traditional databases, but they fall short of guiding us to find errors in probabilistic databases, especially for databases with complex correlations among data. In this chapter, we give a method for tracing errors in probabilistic databases by adopting Bayesian Network (BN) as the framework of representing the correlations among data. We first develop the techniques to construct an augmented Bayesian Network (ABN) for an anomalous query to represent correlations among input data, intermediate data and output data in the query execution. Inspired by the notion of blame in causal models, we then define a notion of blame for ranking candidate errors. Next, we provide an efficient method for computing the degree of blame for each candidate error based on the probabilistic inference upon the ABN. Experimental results show the effectiveness and efficiency of our method.
What problem does this paper attempt to address?