BasicPhylogeneticCombinatorics. Systematic Biology Advance Access published 8,
A. Dress,K. Huber,J. Koolen,V. Moulton
2012-01-01
Abstract:Genomic data have become more and more important in biology, and methods that analyze these data are often based on mathematical ideas. Indeed, in recent years mathematics has become increasingly valuable for biology. The book Basic Phylogenetic Combinatorics, however, clearly shows that the converse is also the case. Evolutionary biology has inspired an exciting new field of mathematics that comprises many beautiful and challenging mathematical questions which have important real-life applications. This book has not been written for biologists. After a short, clear preface in words, the main part of the book is full of mathematical notation, formulas, and proofs. Even for mathematicians, the book might not be as easy to read as the first word of the title suggests. Moreover, the book might not be of direct interest to biologists because it does not describe many methods or algorithms that can be immediately applied to biological data, nor does it describe many applications or refer to papers where such applications can be found. For this, you can consult the book of Felsenstein (2004). This does not mean that Basic Phylogenetic Combinatorics has no relevance for biology. On the contrary, the book describes the mathematical theory that forms the foundation of computational phylogenetics. The theory described in this book has been the basis for numerous methods and algorithms that can be used by biologists. A good example is given at the very end of the book, where the QNet method is described. This method can be used to construct a phylogenetic “split network” from a collection of quartet trees. Such a network is a more informative representation of phylogenetic data than a tree because “reticulate” regions are used to visualize parts of the data that are not tree-like; and this is illustrated by an application to some Salmonella data. Such split networks are used more and more often by biologists to display evolutionary data for which no well-supported phylogenetic tree exists. To give just one example from the literature, Silver et al. (2011) used split networks generated by the QNet method to study the evolutionary history of the bacteria Aeromonas veronii, which can be a pathogen for humans but can also be a symbiont for medicinal leeches. They used the constructed networks to identify which groups of strains evolved down a well-resolved tree and in which groups horizontal gene transfer is likely to have occurred. This was used to show that horizontal gene transfer can occur at high frequency even for strains that are adapted to distinct niches. For mathematicians, especially those with an interest in combinatorics, this book is an excellent way to learn all about phylogenetic combinatorics, which can roughly be described as discrete mathematics related to phylogenetic trees and other discrete mathematical objects related to phylogenetics. In mathematical terms, the field concerns leaf-labeled trees and, more generally, leaf-labeled graphs. These structures are sufficiently general to have more applications than the ones in evolutionary biology, including applications to other evolutionary processes (e.g., language evolution) and other types of data analysis (e.g., voting patterns, word clouds). Nevertheless, the main application of phylogenetic combinatorics is, and might always be, evolutionary biology. The book is almost completely self-contained. The first chapter describes the basic mathematics that is being used in the rest of the book. Then the phylogenetic combinatorics begins, starting with standard wellknown theorems, and continuing all the way to some of the latest results in the field. The book does not just give an overview of results but provides complete and often elegant proofs of all of the theorems and lemmas. References are provided to give the reader the chance to read more about a specific subject, but it is never necessary to read these references to understand the messages in the book. The book concerns 3 “encodings” of unrooted phylogenetic trees, namely, splits, quartets, and metrics. To explain what this means, let us first consider splits. A split is a division of the taxa (the labels of the leaves) into 2 groups. Each branch of a phylogenetic tree describes one split. That is, it divides the taxa into those that are on one side of the branch and those that are on the other side of the branch. Now, suppose that you are given all of the splits described by the branches of some phylogenetic tree. Then, it is possible to uniquely reconstruct this phylogenetic tree from those splits. This is what is meant by saying that splits are an encoding of phylogenetic trees. A quartet of a phylogenetic tree is the restriction of the tree to 4 of its taxa. Given all of the quartets of some phylogenetic tree, it is again possible to uniquely reconstruct the tree. Hence, also quartets form an encoding of phylogenetic trees. Finally, metrics (i.e. distances) are a third encoding of phylogenetic trees. If one knows the pathlength distance between each pair of