Abstract:The coalescent is a powerful statistical framework that allows us to infer past population dynamics leveraging the ancestral relationships reconstructed from sampled molecular sequence data. In many biomedical applications, such as in the study of infectious diseases, cell development, and tumorgenesis, several distinct populations share evolutionary history and therefore become dependent. The inference of such dependence is a highly important, yet a challenging problem. With advances in sequencing technologies, we are well positioned to exploit the wealth of high-resolution biological data for tackling this problem. Here, we present adaPop , a probabilistic model to estimate past population dynamics of dependent populations and to quantify their degree of dependence. An essential feature of our approach is the ability to track the time-varying association between the populations while making minimal assumptions on their functional shapes via Markov random field priors. We provide nonparametric estimators, extensions of our base model that integrate multiple data sources, and fast scalable inference algorithms. We test our method using simulated data under various dependent population histories and demonstrate the utility of our model in shedding light on evolutionary histories of different variants of SARS-CoV-2. Genomic data provide information about evolutionary dynamics—such as an evolving epidemic and tumorgenesis—that is difficult to infer from other source of data. One of the main computational challenges in inferring past population histories is to jointly model dependent subpopulations and correctly quantifying their dependencies over time. When distinct subpopulations have common ancestry in the past and evolve under shared environmental pressure, their population dynamics become dependent. In this work, we propose an efficient inference method for studying dependent population dynamics from genetic data in the coalescent framework: an approach that considers the stochastic process of the "coalescence" of genealogical lineages traveling back in time to explain the statistical properties of a sample's genetic variation. We also extend our framework to jointly model the ancestral and sampling processes incorporating sampling times as an additional source of information. We validate our methods via extensive simulations and demonstrate that our methods provide new insights into the evolutionary dynamics of SARS-CoV-2 novel variants.

Accelerated Bayesian inference of population size history from recombining sequence data

Bayesian Inference of Dependent Population Dynamics in Coalescent Models

An Efficient Bayesian Inference Framework for Coalescent-Based Nonparametric Phylodynamics

Multiple merger coalescent inference of effective population size

Decoding coalescent hidden Markov models in linear time

adaPop: Bayesian inference of dependent population dynamics in coalescent models

Bayesian Inference of Pathogen Phylogeography using the Structured Coalescent Model

Accurate and flexible estimation of effective population size history

Exact Limits of Inference in Coalescent Models

Simultaneous Inference of Past Demography and Selection from the Ancestral Recombination Graph under the Beta Coalescent

Comparison of Bayesian Coalescent Skyline Plot Models for Inferring Demographic Histories

Understanding Past Population Dynamics: Bayesian Coalescent-Based Modeling with Covariates

Sampling through time and phylodynamic inference with coalescent and birth-death models

An Efficient Coalescent Epoch Model for Bayesian Phylogenetic Inference

Accurate inference of population history in the presence of background selection

Limits and convergence properties of the sequentially Markovian coalescent

Bayesian Inference of Species Trees from Multilocus Data

Computing the joint distribution of the total tree length across loci in populations with variable size

Inferring the demographic history from DNA sequences: An importance sampling approach based on non-homogeneous processes

SelNeTime: a python package inferring effective population size and selection intensity from genomic time series data

A Principled Approach to Deriving Approximate Conditional Sampling Distributions in Population Genetics Models with Recombination