adaPop: Bayesian inference of dependent population dynamics in coalescent models
Lorenzo Cappello,Jaehee Kim,Julia A. Palacios
DOI: https://doi.org/10.1371/journal.pcbi.1010897
2023-03-21
PLoS Computational Biology
Abstract:The coalescent is a powerful statistical framework that allows us to infer past population dynamics leveraging the ancestral relationships reconstructed from sampled molecular sequence data. In many biomedical applications, such as in the study of infectious diseases, cell development, and tumorgenesis, several distinct populations share evolutionary history and therefore become dependent. The inference of such dependence is a highly important, yet a challenging problem. With advances in sequencing technologies, we are well positioned to exploit the wealth of high-resolution biological data for tackling this problem. Here, we present adaPop , a probabilistic model to estimate past population dynamics of dependent populations and to quantify their degree of dependence. An essential feature of our approach is the ability to track the time-varying association between the populations while making minimal assumptions on their functional shapes via Markov random field priors. We provide nonparametric estimators, extensions of our base model that integrate multiple data sources, and fast scalable inference algorithms. We test our method using simulated data under various dependent population histories and demonstrate the utility of our model in shedding light on evolutionary histories of different variants of SARS-CoV-2. Genomic data provide information about evolutionary dynamics—such as an evolving epidemic and tumorgenesis—that is difficult to infer from other source of data. One of the main computational challenges in inferring past population histories is to jointly model dependent subpopulations and correctly quantifying their dependencies over time. When distinct subpopulations have common ancestry in the past and evolve under shared environmental pressure, their population dynamics become dependent. In this work, we propose an efficient inference method for studying dependent population dynamics from genetic data in the coalescent framework: an approach that considers the stochastic process of the "coalescence" of genealogical lineages traveling back in time to explain the statistical properties of a sample's genetic variation. We also extend our framework to jointly model the ancestral and sampling processes incorporating sampling times as an additional source of information. We validate our methods via extensive simulations and demonstrate that our methods provide new insights into the evolutionary dynamics of SARS-CoV-2 novel variants.
biochemical research methods,mathematical & computational biology