The distributions under two species-tree models of the total number of ancestral configurations for matching gene trees and species trees
Filippo Disanto,Michael Fuchs,Chun-Yen Huang,Ariel R. Paningbatan,Noah A. Rosenberg
2023-05-07
Abstract:Given a gene-tree labeled topology $G$ and a species tree $S$, the "ancestral configurations" at an internal node $k$ of $S$ represent the combinatorially different sets of gene lineages that can be present at $k$ when all possible realizations of $G$ in $S$ are considered. Ancestral configurations have been introduced as a data structure for evaluating the conditional probability of a gene-tree labeled topology given a species tree, and their enumeration assists in describing the complexity of this computation. In the case that the gene-tree labeled topology $G=t$ matches that of the species tree $S$, by techniques of analytic combinatorics, we study distributional properties of the "total" number of ancestral configurations measured across the different nodes of a random labeled topology $t$ selected under the uniform and the Yule probability models. Under both of these probabilistic scenarios, we show that the total number $T_n$ of ancestral configurations of a random labeled topology of $n$ taxa asymptotically follows a lognormal distribution. Over uniformly distributed labeled topologies, the asymptotic growth of the mean and the variance of $T_n$ are found to satisfy $\mathbb{E}_{\rm U}[T_n] \sim 2.449 \cdot 1.333^n$ and $\mathbb{V}_{\rm U}[T_n] \sim 5.050 \cdot 1.822^n$, respectively. Under the Yule model, which assigns higher probabilities to more balanced labeled topologies, we obtain the mean $\mathbb{E}_{\rm Y}[T_n] \sim 1.425^n$ and the variance $\mathbb{V}_{\rm Y}[T_n] \sim 2.045^n$.
Probability,Combinatorics,Populations and Evolution