TINNiK: Inference of the Tree of Blobs of a Species Network Under the Coalescent

Elizabeth S. Allman,Hector Baños,Jonathan D. Mitchell,John A. Rhodes
DOI: https://doi.org/10.1101/2024.04.20.590418
2024-04-24
Abstract:The tree of blobs of a species network shows only the tree-like aspects of relationships of taxa on a network, omitting information on network substructures where hybridization or other types of lateral transfer of genetic information occur. By isolating such regions of a network, inference of the tree of blobs can serve as a starting point for a more detailed investigation, or indicate the limit of what may be inferrable without additional assumptions. Building on our theoretical work on the identifiability of the tree of blobs from gene quartet distributions under the Network Multispecies Coalescent model, we develop an algorithm, TINNiK, for statistically consistent tree of blobs inference. We provide examples of its application to both simulated and empirical datasets, utilizing an implementation in the R package.
Evolutionary Biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to infer the "Tree of Blobs" in the species network. Specifically, the species network not only contains the tree - like evolutionary relationships among species, but may also contain reticulate structures due to events such as hybridization or horizontal gene transfer. These reticulate structures are called "blobs", which represent the complex situations of gene material transfer between populations. However, it is very difficult to infer the detailed structure of the entire species network, especially when the data set is large and the computing resources are limited. Therefore, the paper proposes an algorithm TINNiK, which is used to statistically and consistently infer the "Tree of Blobs" of the species network from the gene quartet distribution, that is, only retain the tree - like parts in the network and compress the complex reticulate structures into nodes. This can be used as a starting point for further detailed research, or indicate the limit that can be inferred without additional assumptions. The main contributions of the paper are as follows: 1. **Theoretical basis**: Based on the Network Multispecies Coalescent (NMSC) model, the theoretical feasibility of identifying the "Tree of Blobs" from the gene quartet distribution is proved. 2. **Algorithm development**: The TINNiK algorithm is proposed, which infers the "Tree of Blobs" of the species network from multi - gene data through statistical tests and combinatorial reasoning rules. 3. **Application verification**: The application effects of the TINNiK algorithm on simulated data and empirical data are demonstrated, proving its effectiveness and high efficiency in practical applications. Through this method, researchers can quickly obtain simplified but useful information of the species network without fully understanding the details of the complex reticulate structures, providing a basis for further research.