The Probability to Hit Every Bin with a Linear Number of Balls

Stefan Walzer
2024-03-02
Abstract:Assume that $2n$ balls are thrown independently and uniformly at random into $n$ bins. We consider the unlikely event $E$ that every bin receives at least one ball, showing that $\Pr[E] = \Theta(b^n)$ where $b \approx 0.836$. Note that, due to correlations, $b$ is not simply the probability that any single bin receives at least one ball. More generally, we consider the event that throwing $\alpha n$ balls into $n$ bins results in at least $d$ balls in each bin.
Probability,Data Structures and Algorithms
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the probability that each bucket receives at least a certain number of balls when balls are randomly distributed into several buckets under given conditions. Specifically, assume that \(2n\) balls are independently and uniformly randomly thrown into \(n\) buckets. Consider the event \(E\) that each bucket has at least one ball. The author shows the relationship between the probability \(Pr[E]\) of this event and a certain constant \(b\approx0.836\), that is, \(Pr[E]=\Theta(b^{n})\). It should be noted here that due to the correlation between the buckets, \(b\) is not the probability that a single bucket receives at least one ball. More generally, the paper also explores the probability that each bucket receives at least \(d\) balls when \(\alpha n\) balls are thrown into \(n\) buckets. Here, \(\alpha\) and \(d\) are parameters, and \(\alpha\geq d\). By introducing a special distribution \(\Phi(\alpha, d)\), which is a Poisson distribution truncated to values \(\geq d\) and adjusted so that its expectation is \(\alpha\), the paper obtains the following results: 1. When \(\alpha\) and \(d\) are constants and \(\alpha > d\), \(Pr[E]=\Theta(b^{n})\), where \(b = \frac{\alpha^{\alpha}\zeta}{e^{\alpha}\lambda^{\alpha - d}}\). 2. When \(\alpha = d\) (not necessarily a constant), \(Pr[E]=\Theta(b^{n}\sqrt{\frac{d}{n}})\), where \(b=\frac{d^{d}}{e^{d}d!}\). The main contribution of the paper lies in providing an accurate mathematical framework for calculating the above - mentioned probabilities under different conditions, which is of great significance for understanding problems in computer science fields such as random hash functions and minimum perfect hash functions. In addition, the paper also provides specific calculation methods and some numerical results, which are helpful for reference and verification in practical applications.