Fast Sampling for Time-Varying Determinantal Point Processes

Maoying Qiao,Richard Yi Da Xu,Wei Bian,Dacheng Tao
DOI: https://doi.org/10.1145/2943785
IF: 4.157
2016-01-01
ACM Transactions on Knowledge Discovery from Data
Abstract:Determinantal Point Processes (DPPs) are stochastic models which assign each subset of a base dataset with a probability proportional to the subset's degree of diversity. It has been shown that DPPs are particularly appropriate in data subset selection and summarization (e.g., news display, video summarizations). DPPs prefer diverse subsets while other conventional models cannot offer. However, DPPs inference algorithms have a polynomial time complexity which makes it difficult to handle large and time-varying datasets, especially when real-time processing is required. To address this limitation, we developed a fast sampling algorithm for DPPs which takes advantage of the nature of some time-varying data (e.g., news corpora updating, communication network evolving), where the data changes between time stamps are relatively small. The proposed algorithm is built upon the simplification of marginal density functions over successive time stamps and the sequential Monte Carlo (SMC) sampling technique. Evaluations on both a real-world news dataset and the Enron Corpus confirm the efficiency of the proposed algorithm.
What problem does this paper attempt to address?