Improved Variational Bayesian Phylogenetic Inference using Mixtures

Oskar Kviman,Ricky Molén,Jens Lagergren
2023-10-02
Abstract:We present VBPI-Mixtures, an algorithm designed to enhance the accuracy of phylogenetic posterior distributions, particularly for tree-topology and branch-length approximations. Despite the Variational Bayesian Phylogenetic Inference (VBPI), a leading-edge black-box variational inference (BBVI) framework, achieving remarkable approximations of these distributions, the multimodality of the tree-topology posterior presents a formidable challenge to sampling-based learning techniques such as BBVI. Advanced deep learning methodologies such as normalizing flows and graph neural networks have been explored to refine the branch-length posterior approximation, yet efforts to ameliorate the posterior approximation over tree topologies have been lacking. Our novel VBPI-Mixtures algorithm bridges this gap by harnessing the latest breakthroughs in mixture learning within the BBVI domain. As a result, VBPI-Mixtures is capable of capturing distributions over tree-topologies that VBPI fails to model. We deliver state-of-the-art performance on difficult density estimation tasks across numerous real phylogenetic datasets.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to more accurately approximate the posterior distributions of tree topologies and branch lengths in Bayesian phylogenetic inference. Specifically, although existing variational Bayesian phylogenetic inference (VBPI) methods perform well in some aspects, they have difficulties in dealing with the multimodal characteristics of the posterior distribution of tree topologies. This is mainly because the tree - topology space is very large and complex, and traditional sampling methods (such as Markov chain Monte Carlo, MCMC) and existing variational inference methods are difficult to fully explore this space. To solve this problem, the authors propose a new algorithm named VBPI - Mixtures, which combines the latest mixture learning techniques to improve the ability to model the posterior distribution of tree topologies. By using multiple sub - models to work together, VBPI - Mixtures can better capture multiple peaks in the posterior distribution of tree topologies, thereby providing a more accurate approximation of the posterior distribution. ### Specific problem description 1. **Challenges of multimodal posterior distributions**: - The posterior distribution of tree topologies is usually multimodal, which means that there are multiple high - probability regions. Traditional variational inference methods (such as VBPI) often can only capture some modes and cannot fully cover the entire posterior distribution. - This deficiency leads to inaccurate estimations of the posterior distributions of tree topologies and branch lengths, affecting the accuracy of phylogenetic analysis. 2. **Limitations of existing methods**: - **MCMC methods**: Although MCMC methods can theoretically explore the entire posterior distribution, in practice, they require a very long running time and are prone to getting trapped in local optimal solutions. - **Variational inference methods**: Existing variational inference methods (such as VBPI) are computationally efficient but perform poorly when dealing with multimodal distributions because they usually assume that the posterior distribution is unimodal. ### Solutions - **VBPI - Mixtures algorithm**: This algorithm introduces a mixture model and uses multiple sub - models (each sub - model corresponds to a mode of the posterior distribution) to jointly explore the posterior distribution of tree topologies. In this way, each sub - model can focus on different modes, thereby more comprehensively covering the entire posterior distribution. - **Enhanced flexibility**: The introduction of the mixture model increases the flexibility of variational approximation, enabling the model to better adapt to the complex structure of the posterior distribution. - **Improved exploration ability**: Through the joint work of multiple sub - models, VBPI - Mixtures can more effectively explore the tree - topology space and avoid the problem of insufficient exploration that may exist in a single model. ### Experimental verification - **Synthetic data experiments**: By designing complex hierarchical classification target distributions, the authors demonstrate the specialization ability of the mixture model in different parts of the solution space and achieve a smaller Kullback - Leibler divergence than a single model. - **Real - data experiments**: On eight popular real - data sets, VBPI - Mixtures outperforms the existing state - of - the - art methods in both marginal log - likelihood estimation and approximation of the posterior distribution of tree topologies. In conclusion, this paper effectively solves the challenges brought by the multimodality of the posterior distribution of tree topologies in Bayesian phylogenetic inference by proposing the VBPI - Mixtures algorithm, and improves the accuracy and reliability of phylogenetic analysis.