Abstract:Bayesian Markov chain Monte Carlo explores tree space slowly, in part because it frequently returns to the same tree topology. An alternative strategy would be to explore tree space systematically, and never return to the same topology. In this paper, we present an efficient parallelized method to map out the high likelihood set of phylogenetic tree topologies via systematic search, which we show to be a good approximation of the high posterior set of tree topologies. Here `likelihood' of a topology refers to the tree likelihood for the corresponding tree with optimized branch lengths. We call this method `phylogenetic topographer' (PT). The PT strategy is very simple: starting in a number of local topology maxima (obtained by hill-climbing from random starting points), explore out using local topology rearrangements, only continuing through topologies that are better than than some likelihood threshold below the best observed topology. We show that the normalized topology likelihoods are a useful proxy for the Bayesian posterior probability of those topologies. By using a non-blocking hash table keyed on unique representations of tree topologies, we avoid visiting topologies more than once across all concurrent threads exploring tree space. We demonstrate that PT can be used directly to approximate a Bayesian consensus tree topology. When combined with an accurate means of evaluating per-topology marginal likelihoods, PT gives an alternative procedure for obtaining Bayesian posterior distributions on phylogenetic tree topologies.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the efficiency of exploring the phylogenetic tree space within the Bayesian framework. Specifically, the traditional Bayesian Markov Chain Monte Carlo (MCMC) method progresses slowly when exploring the phylogenetic tree space, partly because it often returns to the same tree topologies. This makes the effective exploration of the tree space a challenge, especially when the data set is large in information and the posterior probabilities are concentrated on a few tree topologies, while most random modifications will lead to the exploration of tree topologies with low posterior probabilities. To solve this problem, the paper proposes an efficient parallelized method - the Phylogenetic Topographer (PT), which maps out the set of phylogenetic tree topologies with high likelihood through systematic search and ensures that the same topology will not be visited repeatedly. The core of the PT method is to start from multiple local maximum points and use local tree rearrangement operations (such as the Nearest - Neighbor Interchange (NNI) operation) to only continue exploring those topologies whose likelihood values are higher than the best - observed topology by a certain threshold. This method can not only quickly identify the high - likelihood set containing the credible set, but also avoid the problem of repeated visits when multiple threads explore the tree space through non - blocking hash tables. Through this method, PT can effectively approximate the Bayesian consensus tree topology, and when combined with accurate per - topology marginal likelihood estimation means, PT provides an alternative procedure for obtaining the Bayesian posterior distribution of phylogenetic tree topologies. The paper proves the effectiveness of the PT method through experiments, especially its performance on standard test data sets is highly consistent with that of the MrBayes method.

Systematic Exploration of the High Likelihood Set of Phylogenetic Tree Topologies

Finding high posterior density phylogenies by systematically extending a directed acyclic graph

A simulation approach for change-points on phylogenetic trees

A Topology-Marginal Composite Likelihood Via a Generalized Phylogenetic Pruning Algorithm.

On the importance of assessing topological convergence in Bayesian phylogenetic inference

Variational Supertrees for Bayesian Phylogenetics

Approximate Bayesian computation for Markovian binary trees in phylogenetics

A Variational Approach to Bayesian Phylogenetic Inference

Accurate Bayesian phylogenetic point estimation using a tree distribution parameterized by clade probabilities

An automated convergence diagnostic for phylogenetic MCMC analyses

The Efficacy of Consensus Tree Methods for Summarizing Phylogenetic Relationships from a Posterior Sample of Trees Estimated from Morphological Data

An Annealed Sequential Monte Carlo Method for Bayesian Phylogenetics

RJHMC-Tree for Exploration of the Bayesian Decision Tree Posterior

Variational Combinatorial Sequential Monte Carlo Methods for Bayesian Phylogenetic Inference

HIPSTR: highest independent posterior subtree reconstruction in TreeAnnotator X

Bayesian Least-Squares Supertrees (BLeSS): flexible inference of large time-calibrated phylogenies

Online Bayesian phylogenetic inference: theoretical foundations via Sequential Monte Carlo

Bayesian Inference of Species Trees from Multilocus Data

Harnessing machine learning to guide phylogenetic-tree search algorithms

Probabilistic Path Hamiltonian Monte Carlo

Point estimates in phylogenetic reconstructions