TRAILS: Tree reconstruction of ancestry using incomplete lineage sorting

Iker Rivas-González,Mikkel H. Schierup,John Wakeley,Asger Hobolth
DOI: https://doi.org/10.1371/journal.pgen.1010836
IF: 4.5
2024-02-09
PLoS Genetics
Abstract:Genome-wide genealogies of multiple species carry detailed information about demographic and selection processes on individual branches of the phylogeny. Here, we introduce TRAILS, a hidden Markov model that accurately infers time-resolved population genetics parameters, such as ancestral effective population sizes and speciation times, for ancestral branches using a multi-species alignment of three species and an outgroup. TRAILS leverages the information contained in incomplete lineage sorting fragments by modelling genealogies along the genome as rooted three-leaved trees, each with a topology and two coalescent events happening in discretized time intervals within the phylogeny. Posterior decoding of the hidden Markov model can be used to infer the ancestral recombination graph for the alignment and details on demographic changes within a branch. Since TRAILS performs posterior decoding at the base-pair level, genome-wide scans based on the posterior probabilities can be devised to detect deviations from neutrality. Using TRAILS on a human-chimp-gorilla-orangutan alignment, we recover speciation parameters and extract information about the topology and coalescent times at high resolution. DNA sequences can be compared to reconstruct the evolutionary history of different species. While the ancestral history is usually represented by a single phylogenetic tree, speciation is a more complex process, and, due to the effect of recombination, different parts of the genome might follow different genealogies. For example, even though humans are more closely related to chimps than to gorillas, around 15% of our genome is more similar to the gorilla genome than to the chimp one. Even for those parts of the genome that do follow the same human-chimp topology, we might encounter a last common ancestor at different time points in the past for different genomic fragments. Here, we present TRAILS, a new framework that utilizes the information contained in all these genealogies to reconstruct the speciation process. TRAILS infers unbiased estimates of the speciation times and the ancestral effective population sizes, improving the accuracy when compared to previous methods. TRAILS also reconstructs the genealogy at the highest resolution, inferring, for example, when common ancestry was found for different parts of the genome. This information can also be used to detect deviations from neutrality, effectively inferring natural selection that happened millions of years ago. We validate the method using extensive simulations, and we apply TRAILS to a human-chimp-gorilla multiple genome alignment, from where we recover speciation parameters that are in good agreement with previous estimates.
genetics & heredity
What problem does this paper attempt to address?