The Critical Beta-splitting Random Tree: Heights and Related Results

David Aldous,Boris Pittel
2024-09-06
Abstract:In the critical beta-splitting model of a random $n$-leaf binary tree, leaf-sets are recursively split into subsets, and a set of $m$ leaves is split into subsets containing $i$ and $m-i$ leaves with probabilities proportional to $1/{i(m-i)}$. We study the continuous-time model in which the holding time before that split is exponential with rate $h_{m-1}$, the harmonic number. We (sharply) evaluate the first two moments of the time-height $D_n$ and of the edge-height $L_n$ of a uniform random leaf (that is, the length of the path from the root to the leaf), and prove the corresponding CLTs. We find the limiting value of the correlation between the heights of two random leaves of the same tree realization, and analyze the expected number of splits necessary for a set of $t$ leaves to partially or completely break away from each other. We give tail bounds for the time-height and the edge-height of the {\em tree}, that is the maximal leaf heights. We show that there is a limit distribution for the size of a uniform random subtree, and derive the asymptotics of the mean size. Our proofs are based on asymptotic analysis of the attendant (sum-type) recurrences. The essential idea is to replace such a recursive equality by a pair of recursive inequalities for which matching asymptotic solutions can be found, allowing one to bound, both ways, the elusive explicit solution of the recursive equality. This reliance on recursive inequalities necessitates usage of Laplace transforms rather than Fourier characteristic functions.
Probability
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to study the height of the tree and its related properties in the critical beta - splitting random tree model. Specifically, the paper focuses on the following points: 1. **Time height and edge height**: Study the first two moments of the time height \(D_n\) and the edge height \(L_n\) of a uniformly random leaf in a random tree, and prove the corresponding central limit theorems (CLTs). 2. **Correlation of leaf heights**: Analyze the correlation between the heights of two random leaves in the same tree realization. 3. **Number of splits required for partial or complete separation**: Study the average number of splits required to make a set of leaves partially or completely separated. 4. **Tail bounds of time height and edge height**: Give the tail bounds of the maximum leaf height (i.e., time height and edge height) of the tree. 5. **Limit distribution of the size of a random subtree**: Show the limit distribution of the size of a uniformly random subtree and derive the asymptotic behavior of its mean. ### Specific problems and methods - **Moment estimation of time height \(D_n\) and edge height \(L_n\)**: - The paper gives the exact asymptotic formulas for the first two moments of \(D_n\) and \(L_n\) through recursive analysis. - For example, for the time height \(D_n\), we have: \[ E[D_n]=\zeta^{- 1}(2)\log n+O(1) \] \[ \text{var}(D_n)=(1 + o(1))\frac{2\zeta(3)}{\zeta^{3}(2)}\log n \] - For the edge height \(L_n\), we have: \[ E[L_n]=\frac{1}{2\zeta(2)}\log^{2}n+\gamma\zeta(2)+\frac{\zeta(3)}{\zeta^{2}(2)}\log n+O(1) \] \[ \text{var}(L_n)=\frac{2\zeta(3)}{3\zeta^{3}(2)}\log^{3}n+O(1) \] - **Correlation of leaf heights**: - The correlation coefficient \(r_n\) between the time heights \(D_n^{(1)}\) and \(D_n^{(2)}\) of two different leaves is studied, and it is proved that under certain assumptions \(r_n = O(\log^{-1}n)\), that is, asymptotically these two heights are uncorrelated. - **Number of splits required for partial or complete separation**: - Analyze the average number of splits required to make a set of \(t\) leaves partially or completely separated. - **Tail bounds of time height and edge height**: - Give the tail bounds of the maximum leaf height, for example: \[ P\left(D_n\geq(2+\epsilon)\log n\right)\leq\frac{1}{n^{\rho\epsilon}} \] \[ P\left(L_n\geq(1 + \epsilon)\beta\log_2n\right)\leq\exp\left(-\Theta(\epsilon\log n)\right) \] - **Limit distribution of the size of a random subtree**: - Prove the limit distribution of the size of a uniformly random subtree and derive the asymptotic behavior of its mean. ### Methodology - **Recursive analysis**: The paper mainly uses the recursive analysis method to study these properties, especially by using recursive inequalities to approximate the exact solutions of recursive equations. - **Laplace transform**: Since the variables are far from the sum of independent terms, the Laplace transform is used to recursively upper and lower bound the real - valued Laplace transform. - **Central limit theorem**: Through recursive analysis and Laplace transform, the central limit theorems of time height and edge height are proved. ### Conclusion The paper through detailed mathematical analysis...