Unravelling complex hybrid and polyploid evolutionary relationships using phylogenetic placement of paralogs from target enrichment data

Nora Walden,Christiane Kiefer,Marcus A Koch
DOI: https://doi.org/10.1101/2024.06.28.601132
2024-07-02
Abstract:Phylogenomic datasets comprising hundreds of genes have become the standard for plant systematics and phylogenetics. However, large scale phylogenomic studies often exclude polyploids and hybrids due to the challenges in assessing paralog status of targeted loci and incorporating them into tree reconstruction methods. Using a target enrichment dataset of 1081 genes from 452 samples from the Brassicaceae tribe Arabideae, including many hybrid and high ploidy taxa, we developed a novel approach to disentangle the evolutionary history of this phylogenetically and taxonomically challenging clade. Our approach extends beyond commonly used gene tree-species tree reconciliation techniques by using phylogenetic placement, a method adopted from metagenomics, of paralogous sequences into a diploid tree. We call this approach Paralog PhyloGenomics (PPG), and show how it allows for the simultaneous assessment of the origins of ancient and recent hybrids and autopolyploids, and the detection of nested polyploidization events. Additionally, we demonstrate how synonymous substitution rates provide further evidence for the mode of polyploidization, specifically to distinguish between allo- and autopolyploidization, and to identify hybridization events involving a ghost lineage. Our approach will be a valuable addition to phylogenomic methods available for the study of polyploids.
Evolutionary Biology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to use paralogs in target - enriched data to analyze the evolutionary relationships of complex hybrid and polyploid species. Specifically, the authors developed a new method - Paralog PhyloGenomics (PPG), by placing paralog sequences into diploid trees to reveal the complex evolutionary history of these species. ### Specific background of the problem 1. **Challenges in large - scale phylogenetic studies** - In plant phylogeny and systematics, although large - scale phylogenetic datasets (containing hundreds of genes) have become the standard, dealing with polyploid and hybrid species remains a major challenge. The main reason is that it is difficult to assess the homology status of target loci and incorporate them into tree - reconstruction methods. 2. **Limitations of existing methods** - Commonly used gene - tree - species - tree reconciliation techniques are not effective in dealing with polyploid and hybrid species because these methods usually rely on pre - determined species trees and have difficulty handling complex paralog information. 3. **Research needs in the Arabideae tribe** - The Arabideae tribe is one of the largest tribes in the Brassicaceae family, containing about 550 plant species, 63% of which are polyploid species. There are a large number of hybridization, reticulate evolution, and introgression phenomena within this tribe, making its classification and evolutionary relationships very complex, and new methods are urgently needed to analyze these relationships. ### Core content of the new method The method proposed by the authors includes the following key steps: 1. **Data collection and processing** - Using a target - enriched dataset (1,081 genes, 452 samples), DNA samples are obtained from the Arabideae tribe, and sequenced and data - processed. 2. **Paralog identification and assembly** - Use the HybPiper software to assemble gene sequences and identify orthologous genes and other paralogs (i.e., all full - length contigs, regardless of coverage and reference sequence similarity). 3. **Phylogenetic placement of paralogs** - Place paralog sequences into diploid trees by the phylogenetic placement method. This method draws on techniques in metagenomics and can analyze the evolutionary positions of paralogs without relying on pre - determined species trees. 4. **Verification and application** - Further distinguish autopolyploidization and allopolyploidization events and identify hybridization events involving "ghost lineages" by analyzing indicators such as the synonymous substitution rate. ### Advantages of the method - **No prior knowledge required** : There is no need to know in advance about hybridization events, whole - genome duplications, or ploidy states. - **Applicable to large - scale datasets** : It can handle datasets containing a large number of samples and genes and is suitable for highly complex polyploid species. - **Improved resolution** : It can more accurately analyze the evolutionary relationships between species, especially for those species that have undergone multiple nested polyploidizations. In summary, this paper aims to solve the current problems in phylogenetic research in dealing with polyploid and hybrid species by developing and applying the Paralog PhyloGenomics method, so as to better understand the evolutionary history of these species.