Estimating Species Trees from Quartet Gene Tree Distributions under the Coalescent Model

Martin Kreidl
DOI: https://doi.org/10.48550/arXiv.1108.1628
2011-08-08
Abstract:In this article we propose a new method, which we name 'quartet neighbor joining', or 'quartet-NJ', to infer an unrooted species tree on a given set of taxa T from empirical distributions of unrooted quartet gene trees on all four-taxon subsets of T. In particular, quartet-NJ can be used to estimate a species tree on T from distributions of gene trees on T. The quartet-NJ algorithm is conceptually very similar to classical neighbor joining, and its statistical consistency under the multispecies coalescent model is proved by a variant of the classical 'cherry picking'-theorem. In order to demonstrate the suitability of quartet-NJ, coalescent processes on two different species trees (on five resp. nine taxa) were simulated, and quartet-NJ was applied to the simulated gene tree distributions. Further, quartet-NJ was applied to quartet distributions obtained from multiple sequence alignments of 28 proteins of nine prokaryotes.
Populations and Evolution,Genomics
What problem does this paper attempt to address?