Combinatorial Topological Models for Phylogenetic Networks and the Mergegram Invariant

Paweł Dłotko,Jan Felix Senge,Anastasios Stefanou
2024-08-27
Abstract:Mutations of genetic sequences are often accompanied by their recombinations, known as phylogenetic networks. These networks are typically reconstructed from coalescent processes that may arise from optimal merging or fitting together a given set of phylogenetic trees. Nakhleh formulated the phylogenetic network reconstruction problem (PNRP): Given a family of phylogenetic trees over a common set of taxa, is there a unique minimal phylogenetic network whose set of spanning trees contains the family? Inspired by ideas from topological data analysis (TDA), we devise lattice-diagram models for phylogenetic networks and filtrations, the cliquegram and the facegram, both generalizing the dendrogram (filtered partition) model of phylogenetic trees. Both models allow us to solve the PNRP rigorously. The solutions are obtained by taking the join of the dendrograms on the lattice of cliquegrams or facegrams. Furthermore, computing the join-facegram is polynomial in the size and number of the input trees. Cliquegrams and facegrams can be challenging to work with when the number of taxa is large. We propose a topological invariant of facegrams and filtrations, called the mergegram, by extending a construction by Elkin and Kurlin defined on dendrograms. We show that the mergegram is invariant of weak equivalences of filtrations which, in turn, implies that it is a 1-Lipschitz stable invariant with respect to Mémoli's tripod distance. The mergegram, can be used as a computable proxy for phylogenetic networks and filtrations of datasets. We illustrate the utility of these new TDA-concepts to phylogenetics, by performing experiments with artificial and biological data.
Algebraic Topology,Populations and Evolution
What problem does this paper attempt to address?