TreePPL: A Universal Probabilistic Programming Language for Phylogenetics
Viktor E Senderov,Jan Kudlicka,Daniel Lunden,Viktor Palmkvist,Mariana P Braga,Emma Granqvist,Gizem çaylak,Thimothée Virgoulay,David Broman,Fredrik E Ronquist
DOI: https://doi.org/10.1101/2023.10.10.561673
2024-11-13
Abstract:We present TreePPL, a universal probabilistic programming language (PPL) designed for probabilistic modeling and inference in phylogenetics. In TreePPL, the model is expressed as a computer program, which can generate simulations from the model conditioned on some input data. Specialized inference machinery then uses this program to estimate the posterior probability distribution. The aim is to allow the user to focus on describing the model, and provide the inference machinery for free. The TreePPL modeling language is meant to be familiar to users of R or Python, and utilizes a functional programming style that facilitates the application of generic inference algorithms. The model program can be conveniently compiled and run from a Python or R environment, which can be used for pre-processing, feeding the model with the observed data, controlling and running the inference, and receiving and post-processing the output data. The inference machinery is generated by a compiler framework developed specifically for supporting domain-specific modeling and inference, the Miking CorePPL framework. It currently supports a range of inference strategies---including sequential Monte Carlo, Markov chain Monte Carlo, and combinations thereof---and is based on several recent innovations that are important for efficient PPL inference on phylogenetic models. It also allows advanced users to implement novel inference strategies for models described using TreePPL or other domain-specific modeling languages. We briefly describe the TreePPL modeling language and the Python environment, and give some examples of modeling and inference with TreePPL. The examples illustrate how TreePPL can be used to address a range of common problem types considered in statistical phylogenetics, from diversification and tree inference to complex trait evolution. A few major challenges remain to be addressed before the phylogenetic model space is adequately covered by efficient automatic inference techniques, but several of them are being addressed in ongoing work on TreePPL. We end the paper by discussing how probabilistic programming can facilitate further use of machine learning in addressing important challenges in statistical phylogenetics.
Bioinformatics