Tree height and the asymptotic mean of the Colijn-Plazzotta rank of unlabeled binary rooted trees
Luc Devroye,Michael R. Doboli,Noah A. Rosenberg,Stephan Wagner
2024-09-28
Abstract:The Colijn--Plazzotta ranking is a bijective encoding of the unlabeled binary rooted trees with positive integers. We show that the rank $f(t)$ of a tree $t$ is closely related to its height $h$, the length of the longest path from a leaf to the root. We consider the rank $f(\tau_n)$ of a random $n$-leaf tree $\tau_n$ under each of three models: (i) uniformly random unlabeled unordered binary rooted trees, or unlabeled topologies; (ii) uniformly random leaf-labeled binary trees, or labeled topologies under the uniform model; and (iii) random binary search trees, or labeled topologies under the Yule--Harding model. Relying on the close relationship between tree rank and tree height, we obtain results concerning the asymptotic properties of $\log \log f(\tau_n)$. In particular, we find $\mathbb{E} \{\log_2 \log f(\tau_n)\} \sim 2 \sqrt{\pi n}$ for uniformly random unlabeled ordered binary rooted trees and uniformly random leaf-labeled binary trees, and for a constant $\alpha \approx 4.31107$, $\mathbb{E}\{\log_2 \log f(\tau_n)\} \sim \alpha \log n $ for leaf-labeled binary trees under the Yule--Harding model. We show that the mean of $f(\tau_n)$ itself under the three models is largely determined by the rank $c_{n-1}$ of the highest-ranked tree -- the caterpillar -- obtaining an asymptotic relationship with $\pi_n c_{n-1}$, where $\pi_n$ is a model-specific function of $n$. The results resolve open problems, providing a new class of results on an encoding useful in mathematical phylogenetics.
Combinatorics,Probability