Abstract:This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, \dots, p_n)$, where the probabilities of the output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ of the sampling algorithm must be specified using at most $k$ bits of precision? We present a theoretical framework for formulating this problem and provide new techniques for finding sampling algorithms that are optimal both statistically (in the sense of sampling accuracy) and information-theoretically (in the sense of entropy consumption). We leverage these results to build a system that, for a broad family of measures of statistical accuracy, delivers a sampling algorithm whose expected entropy usage is minimal among those that induce the same distribution (i.e., is "entropy-optimal") and whose output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ is a closest approximation to the target distribution $(p_1, \dots, p_n)$ among all entropy-optimal sampling algorithms that operate within the specified $k$-bit precision. This optimal approximate sampler is also a closer approximation than any (possibly entropy-suboptimal) sampler that consumes a bounded amount of entropy with the specified precision, a class which includes floating-point implementations of inversion sampling and related methods found in many software libraries. We evaluate the accuracy, entropy consumption, precision requirements, and wall-clock runtime of our optimal approximate sampling algorithms on a broad set of distributions, demonstrating the ways that they are superior to existing approximate samplers and establishing that they often consume significantly fewer resources than are needed by exact samplers.

Binary sampling from discrete distributions

Binary Bouncy Particle Sampler

IID Sampling from Intractable Distributions

Determining Sample Size in Binary Measurement System

Importance Sampling for counting statistics in one-dimensional systems

Bayesian Binary Search

Optimal Approximate Sampling from Discrete Probability Distributions

Scalable Bayes under Informative Sampling

sampling with probability matching

Sampling from manifold-restricted distributions using tangent bundle projections

High-Performance Constant-Time Discrete Gaussian Sampling

IID Sampling from Doubly Intractable Distributions

A Broad and General Sequential Sampling Scheme

Dynamic Sampling from a Discrete Probability Distribution with a Known Distribution of Rates

Fast, Precise Thompson Sampling for Bayesian Optimization

Binary search trees of permuton samples

Achieving Efficiency in Black Box Simulation of Distribution Tails with Self-structuring Importance Samplers

Estimation of Population Size from Biased Samples Using Non-Parametric Binary Regression

$\textit{Jump Your Steps}$: Optimizing Sampling Schedule of Discrete Diffusion Models

Independent Range Sampling on Interval Data (Longer Version)