Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion

Derrick J. Zwickl
Abstract:Phylogenetic trees have a multitude of applications in biology, epidemiology, conservation and even forensics. However, the inference of phylogenetic trees can be extremely computationally intensive. The computational burden of such analyses becomes even greater when model-based methods are used. Model-based methods have been repeatedly shown to be the most accurate choice for the reconstruction of phylogenetic trees, and thus are an attractive choice despite their high computational demands. Using the Maximum Likelihood (ML) criterion to choose among phylogenetic trees is one commonly used model-based technique. Until recently, software for performing ML analyses of biological sequence data was largely intractable for more vi than about one hundred sequences. Because advances in sequencing technology now make the assembly of datasets consisting of thousands of sequences common, ML search algorithms that are able to quickly and accurately analyze such data must be developed if ML techniques are to remain a viable option in the future. I have developed a fast and accurate algorithm that allows ML phylogenetic searches to be performed on datasets consisting of thousands of sequences. My software uses a genetic algorithm approach, and is named GARLI (Genetic Algorithm for Rapid Likelihood Inference). The speed of this new algorithm results primarily from its novel technique for partial optimization of branch-length parameters following topological rearrangements. Experiments performed with GARLI show that it is able to analyze large datasets in a small fraction of the time required by the previous generation of search algorithms. The program also performs well relative to two other recently introduced fast ML search programs. Large parallel computer clusters have become common at academic institutions in recent years, presenting a new resource to be used for phylogenetic analyses. The P-GARLI algorithm extends the approach of GARLI to allow simultaneous use of many computer processors. The processors may be instructed to work together on a phylogenetic search in either a highly coordinated or largely independent fashion.
Biology,Computer Science
What problem does this paper attempt to address?