Towards reliable detection of introgression in the presence of among-species rate variation

Thore Koppetsch,Milan Malinsky,Michael Matschiner
DOI: https://doi.org/10.1093/sysbio/syae028
IF: 9.16
2024-06-25
Systematic Biology
Abstract:The role of interspecific hybridization has recently seen increasing attention, especially in the context of diversification dynamics. Genomic research has now made it abundantly clear that both hybridization and introgression – the exchange of genetic material through hybridization and backcrossing – are far more common than previously thought. Besides cases of ongoing or recent genetic exchange between taxa, an increasing number of studies report "ancient introgression" – referring to results of hybridization that took place in the distant past. However, it is not clear whether commonly used methods for the detection of introgression are applicable to such old systems, given that most of these methods were originally developed for analyses at the level of populations and recently diverged species, affected by recent or ongoing genetic exchange. In particular, the assumption of constant evolutionary rates, which is implicit in many commonly used approaches, is more likely to be violated as evolutionary divergence increases. To test the limitations of introgression detection methods when being applied to old systems, we simulated thousands of genomic datasets under a wide range of settings, with varying degrees of among-species rate variation and introgression. Using these simulated datasets, we showed that some commonly applied statistical methods, including the D -statistic and certain tests based on sets of local phylogenetic trees, can produce false-positive signals of introgression between divergent taxa that have different rates of evolution. These misleading signals are caused by the presence of homoplasies occurring at different rates in different lineages. To distinguish between the patterns caused by rate variation and genuine introgression, we developed a new test that is based on the expected clustering of introgressed sites along the genome, and implemented this test in the program Dsuite.
evolutionary biology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the effectiveness and reliability of current methods for detecting gene flow between species (i.e., hybridization and introgression) when applied to ancient systems. Specifically, the author is concerned with whether these methods are suitable for detecting gene flow events that occurred in the remote past, especially when there are differences in the evolutionary rates among different species. Many existing detection methods, such as the D - statistic, were initially developed to analyze data at the population level or for recently diverged species, and these methods usually assume that the evolutionary rate is constant across all species. However, in practical applications, this assumption is often not valid, especially when studying ancient, highly diverged species. To evaluate the performance of these methods in ancient systems, the author conducted a large number of genomic data simulation experiments, considering different degrees of inter - species evolutionary rate variation and gene flow rates. Through these simulations, they found that some commonly used methods, such as the D - statistic and certain tests based on local phylogenetic tree sets, may generate false - positive signals due to homoplasies that occur at different rates in different lineages. These signals wrongly indicate that gene flow has occurred between significantly diverged species. To solve this problem, the author developed a new testing method, which is based on the expected clustering pattern of introgression sites along the genome and implemented it into a software program named Dsuite. This new method aims to distinguish patterns caused by evolutionary rate changes from real gene flow events, thereby improving the reliability and accuracy of detecting ancient gene flow events.