Abstract:Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly arise in uncertain data management: First, how to provide explanations for query results, e.g., Why is this tuple in my result? or Why does this output tuple have such high probability?. Second, the problem of determining the sensitive input tuples for the given query, e.g., users are interested to know the input tuples that can substantially alter the output, when their probabilities are modified (since they may be unsure about the input probability values). Existing systems provide the lineage/provenance of each of the output tuples in addition to the output probabilities, which is a boolean formula indicating the dependence of the output tuple on the input tuples. However, lineage does not immediately provide a quantitative relationship and it is not informative when we have multiple output tuples. In this paper, we propose a unified framework that can handle both the issues mentioned above to facilitate robust query processing. We formally define the notions of influence and explanations and provide algorithms to determine the top-l influential set of variables and the top-l set of explanations for a variety of queries, including conjunctive queries, probabilistic threshold queries, top-k queries and aggregation queries. Further, our framework naturally enables highly efficient incremental evaluation when input probabilities are modified (e.g., if uncertainty is resolved). Our preliminary experimental results demonstrate the benefits of our framework for performing robust query processing over probabilistic databases.

Boosting Twig Joins in Probabilistic XML.

Keywords filtering over probabilistic XML data

Answering Queries using Views over Probabilistic XML: Complexity and Tractability

Holistic Twig Joins Based on Sketch Tree

A System for Keyword Search on Probability XML Data

Aggregate Queries on Constrained Probabilistic Similarity Join Pairs

TwigStack-MR: An Approach to Distributed XML Twig Query Using MapReduce

Efficient Holistic Twig Join Algorithm on XML Documents with Optimization Rules and Index

Exploit sequencing to accelerate XML twig query answering

The Optimization of Complex XML Queries over XML Streams under DTD

H-Tree: a hybrid structure for confidence computation in probabilistic databases

Complex Twig Pattern Query Processing over XML Streams

Scrubbing Query Results From Probabilistic Databases

Sensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases.

Querying graphs with uncertain predicates.

Keywords Query of uncertain spatiotemporal data based on XML

Sliding-Window Probabilistic Threshold Aggregate Queries on Uncertain Data Streams

Matching Of Twig Pattern With And/Or Predicates Over Xml Streams

Track: a Novel XML Join Algorithm for Efficient Processing Twig Queries

Scalable Probabilistic Databases with Factor Graphs and MCMC

Probery: A Probability-based Incomplete Query Optimization for Big Data