Abstract:The dramatic growth in the number of application domains that naturally generate probabilistic, uncertain data has resulted in a need for efficiently supporting complex querying and decision-making over such data. In this paper, we address the problem of on-the-fly clustering and ranking over probabilistic databases. We begin with a systematic exploration of ranking in probabilistic databases by viewing it as a multi-criteria optimization problem, and by deriving a set of features that capture the key properties of a probabilistic dataset that dictate the ranked result. We contend that a single, specific ranking function may not suffice for probabilistic databasess, and we instead propose two parameterized ranking functions, called PRF w and PRF , that can approximate many of the previously proposed ranking functions. We present several novel algorithms for efficient computing such ranking functions using generating functions, even over databases that exhibit complex correlation patterns modeled using probabilistic and/xor trees or Markov networks. We further propose that the parameters of the ranking function be learned from user preferences, and develop an approach to learn such parameters. We also develop a hierarchical framework for efficiently combining on-the-fly clustering and ranking (called a ClusterRank query) over probabilistic databases. Our framework is based on a general definition of clustering, called restricted soft-t clustering, where a tuple is allowed to participate in at most t clusters. We show how several of our ranking functions can be seamlessly integrated into this framework, which not only allows ranking to continue in parallel with clustering, but also enables pruning of a large portion of the search space. Finally, we present a comprehensive experimental study comparing different ranking functions, and illustrating the effectiveness of our clustering framework.

Bayesian Rank-Clustering

Bayesian Aggregation of Order-Based Rank Data

Rank-based Bayesian clustering via covariate-informed Mallows mixtures

Bayesian inferences on uncertain ranks and orderings: Application to ranking players and lineups

Revealing subgroup structure in ranked data using a Bayesian WAND

Efficient Estimation for Rank-Based Regression with Clustered Data

Ranking and Clustering in Probabilistic Databases

Bayesian rank-based hypothesis testing for the rank sum test, the signed rank test, and Spearman's ρ

Bayesian Level-Set Clustering

Bayesian Framework for Causal Inference with Principal Stratification and Clusters

A Bayesian Approach to Restricted Latent Class Models for Scientifically-Structured Clustering of Multivariate Binary Outcomes

Bayesian rank penalization.

Bayesian Clustering for Ordinal Data Based on Finite Mixture Models of Latent Variables

Bayesian ranking and selection with applications to field studies, economic mobility, and forecasting

Bayesian Plackett–Luce Mixture Models for Partially Ranked Data

Sparse Bayesian Learning for Ranking.

The Bayesian Sorting Hat: A Decision-Theoretic Approach to Size-Constrained Clustering

Bayesian Clustering with Variable and Transformation Selections

Generalized linear mixed model with bayesian rank likelihood

A review on Bayesian model-based clustering

Bayesian Mixture Models With Focused Clustering for Mixed Ordinal and Nominal Data