An R package for nonparametric inference on dynamic populations with infinitely many types

Filippo Ascolani,Stefano Damato,Matteo Ruggiero
2024-09-24
Abstract:Fleming-Viot diffusions are widely used stochastic models for population dynamics which extend the celebrated Wright-Fisher diffusions. They describe the temporal evolution of the relative frequencies of the allelic types in an ideally infinite panmictic population, whose individuals undergo random genetic drift and at birth can mutate to a new allelic type drawn from a possibly infinite potential pool, independently of their parent. Recently, Bayesian nonparametric inference has been considered for this model when a finite sample of individuals is drawn from the population at several discrete time points. Previous works have fully described the relevant estimators for this problem, but current software is available only for the Wright-Fisher finite-dimensional case. Here we provide software for the general case, overcoming some non trivial computational challenges posed by this setting. The R package FVDDPpkg efficiently approximates the filtering and smoothing distribution for Fleming-Viot diffusions, given finite samples of individuals collected at different times. A suitable Monte Carlo approximation is also introduced in order to reduce the computational cost.
Computation,Probability,Populations and Evolution,Quantitative Methods,Applications
What problem does this paper attempt to address?
The paper attempts to address the following issues: 1. **Developing Software Tools**: The main goal of the paper is to provide an R package (FVDDPpkg) for non-parametric inference with infinitely many types of dynamic populations. This package overcomes some non-trivial computational challenges encountered in general cases (not limited to finite-dimensional cases). 2. **Filtering and Smoothing Distributions**: The package can efficiently approximate the filtering and smoothing distributions of the Fleming-Viot diffusion process, given individual sample data collected at different time points. Specifically, FVDDPpkg can handle finite samples collected from the population and perform statistical inference based on them. 3. **Genetic Frequency Inference**: By utilizing the hidden Markov model framework, researchers can infer the changes in allele frequencies in the population over time based on the collected data. Additionally, it is possible to assess the probability of observing recorded allele types again in newly sampled individuals in the future. In summary, the paper aims to provide a practical tool that enables researchers to perform efficient statistical analysis of complex population genetic data, especially when dealing with infinitely many types.