Abstract:The dramatic growth in the number of application domains that naturally generate probabilistic, uncertain data has resulted in a need for efficiently supporting complex querying and decision-making over such data. In this paper, we address the problem of on-the-fly clustering and ranking over probabilistic databases. We begin with a systematic exploration of ranking in probabilistic databases by viewing it as a multi-criteria optimization problem, and by deriving a set of features that capture the key properties of a probabilistic dataset that dictate the ranked result. We contend that a single, specific ranking function may not suffice for probabilistic databasess, and we instead propose two parameterized ranking functions, called PRF w and PRF , that can approximate many of the previously proposed ranking functions. We present several novel algorithms for efficient computing such ranking functions using generating functions, even over databases that exhibit complex correlation patterns modeled using probabilistic and/xor trees or Markov networks. We further propose that the parameters of the ranking function be learned from user preferences, and develop an approach to learn such parameters. We also develop a hierarchical framework for efficiently combining on-the-fly clustering and ranking (called a ClusterRank query) over probabilistic databases. Our framework is based on a general definition of clustering, called restricted soft-t clustering, where a tuple is allowed to participate in at most t clusters. We show how several of our ranking functions can be seamlessly integrated into this framework, which not only allows ranking to continue in parallel with clustering, but also enables pruning of a large portion of the search space. Finally, we present a comprehensive experimental study comparing different ranking functions, and illustrating the effectiveness of our clustering framework.

Sensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases.

Scrubbing Query Results From Probabilistic Databases

Sensitivity Analysis of Answer Ordering from Probabilistic Databases.

A Unified Approach to Ranking in Probabilistic Databases

Computing and Maintaining Provenance of Query Result Probabilities in Uncertain Knowledge Graphs

Semantics and Evaluation of Top-k Queries in Probabilistic Databases

Probabilistic Robustness Analysis—Risks, Complexity, and Algorithms

Ranking and Clustering in Probabilistic Databases

Positive and Negative Explanations of Uncertain Reasoning in the Framework of Possibility Theory

Securing Databases from Probabilistic Inference

On Generating Monolithic and Model Reconciling Explanations in Probabilistic Scenarios

Efficient processing of probabilistic set-containment queries on uncertain set-valued data

Qualitative Propagation and Scenario-based Explanation of Probabilistic Reasoning

Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations

A Statistical Theory for the Analysis of Uncertain Systems

Estimation of Concept Explanations Should be Uncertainty Aware

Consensus Answers for Queries over Probabilistic Databases.

BayesDB: A probabilistic programming system for querying the probable implications of data

A New Statistical Approach for the Analysis of Uncertain Systems

Bayesian probabilistic propagation of hybrid uncertainties: Estimation of response expectation function, its variable importance and bounds

Analyzing Explainer Robustness via Probabilistic Lipschitzness of Prediction Functions