Abstract:We study the fundamental problem of sampling independent events, called subset sampling. Specifically, consider a set of $n$ events $S=\{x_1, \ldots, x_n\}$, where each event $x_i$ has an associated probability $p(x_i)$. The subset sampling problem aims to sample a subset $T \subseteq S$, such that every $x_i$ is independently included in $S$ with probability $p_i$. A naive solution is to flip a coin for each event, which takes $O(n)$ time. However, the specific goal is to develop data structures that allow drawing a sample in time proportional to the expected output size $\mu=\sum_{i=1}^n p(x_i)$, which can be significantly smaller than $n$ in many applications. The subset sampling problem serves as an important building block in many tasks and has been the subject of various research for more than a decade. However, most of the existing subset sampling approaches are conducted in a static setting, where the events or their associated probability in set $S$ is not allowed to be changed over time. These algorithms incur either large query time or update time in a dynamic setting despite the ubiquitous time-evolving events with changing probability in real life. Therefore, it is a pressing need, but still, an open problem, to design efficient dynamic subset sampling algorithms. In this paper, we propose ODSS, the first optimal dynamic subset sampling algorithm. The expected query time and update time of ODSS are both optimal, matching the lower bounds of the subset sampling problem. We present a nontrivial theoretical analysis to demonstrate the superiority of ODSS. We also conduct comprehensive experiments to empirically evaluate the performance of ODSS. Moreover, we apply ODSS to a concrete application: influence maximization. We empirically show that our ODSS can improve the complexities of existing influence maximization algorithms on large real-world evolving social networks.

Dynamic Sampling from a Discrete Probability Distribution with a Known Distribution of Rates

Non-Stochastic CDF Estimation Using Threshold Queries

Dynamic Sampling Allocation and Design Selection.

Dynamic MCMC Sampling

Optimal Approximate Sampling from Discrete Probability Distributions

Dynamic programming of some sequential sampling design

Data Structures for Density Estimation

Dynamic Sampling Procedure for Decomposable Random Networks

Statistical-Computational Trade-offs for Density Estimation

The Randomness Recycler: A new technique for perfect sampling

sampling with probability matching

Dynamic Sampling from Graphical Models.

Optimal Dynamic Subset Sampling: Theory and Applications

Sampling low-fidelity outputs for estimation of high-fidelity density and its tails

Optimal Sampling and Scheduling for Timely Status Updates in Multi-Source Networks

Achieving Efficiency in Black Box Simulation of Distribution Tails with Self-structuring Importance Samplers

Numerical Estimation of Limiting Large-Deviation Rate Functions

Software Runtime Monitoring with Adaptive Sampling Rate to Collect Representative Samples of Execution Traces

Dynamic Inference in Probabilistic Graphical Models

Sampling depth trade-off in function estimation under a two-level design

Dynamic Sampling Allocation for Selecting a Good Enough Alternative.