Phyloformer: Fast, accurate and versatile phylogenetic reconstruction with deep neural networks

Luca Nesterenko,Luc Blassel,Philippe Veber,Bastien Boussau,Laurent Jacob
DOI: https://doi.org/10.1101/2024.06.17.599404
2024-06-22
Abstract:Phylogenetic inference aims at reconstructing the tree describing the evolution of a set of sequences descending from a common ancestor. The high computational cost of state-of-the-art Maximum likelihood and Bayesian inference methods limits their usability under realistic evolutionary models. Harnessing recent advances in likelihood-free inference and geometric deep learning, we introduce Phyloformer, a fast and accurate method for evolutionary distance estimation and phylogenetic reconstruction. Sampling many trees and sequences under an evolutionary model, we train the network to learn a function that enables predicting the former from the latter. Under a commonly used model of protein sequence evolution and exploiting GPU acceleration, it outpaces fast distance methods while matching maximum likelihood accuracy on simulated and empirical data. Under more complex models, some of which include dependencies between sites, it outperforms other methods. Our results pave the way for the adoption of sophisticated realistic models for phylogenetic inference.
Bioinformatics
What problem does this paper attempt to address?
This paper introduces Phyloformer, a new method for fast and accurate estimation of evolutionary distances and phylogenetic reconstruction using deep neural networks. Current state-of-the-art maximum likelihood and Bayesian methods are computationally expensive when dealing with complex evolutionary models, limiting their practicality. Phyloformer bypasses the costly likelihood computation by simulating data and training the network to learn the function of predicting distances from sequences. It performs faster than fast distance methods under common protein sequence evolution models and achieves similar accuracy to the maximum likelihood method on both simulated and empirical data. For more complex models that involve site dependencies, Phyloformer outperforms other methods. This approach paves the way for phylogenetic inference using more realistic models.