A Super Pan‐genome Map Provides Genomic Insights into Evolution of Diploid Cotton Species

Xueqiang Wang,Hejun Lu,Yan Zhao,Zhiyuan Zhang,Jun Li,Zeyu Dong,Yupeng Hao,Lei Fang,Xueying Guan,Ting Zhao,Yan Hu,Tianzhen Zhang
DOI: https://doi.org/10.1002/imo2.15
2024-01-01
Abstract:A high-quality super pan-genome was built using 22 representative diploid cottons species. Adaptive evolution among extant Gossypium species was investigated. Specific genes were enriched for different terms, revealing variations in characteristics of different cotton species. The 321 hotspot regions of structural variations (SVs), containing 90 genes associated with fiber initiation and/or elongation, were detected. A 444-bp deletion in the promoter sequence of GoNe that explained the lack of foliar nectary in G. gossypiodes (D6) and G. schwendimanii (D11) was identified. To the editor, The Gossypium genus can be divided into eight diploid cotton groups (A, B, C, D, E, F, G, and K genomes), comprising 45 diploid species. Inferring ancestral genomes (IAG) among extant Gossypium species is an important goal of comparative genomics, and several mathematical models and approaches have been proposed for IAG. Recently, a new framework, inferring ancestor genome structure (IAGS), was described [1]. Over the last decades, there have been several efforts to deeply sequence many diploid cotton species, but pan-genomic analyses have mainly focused on tetraploid cotton species [2-7]. A tetraploid cotton pan-genome was constructed by combining newly sequenced genomes of Gossypium hirsutum L. and two other wild species with sequences from five previously published tetraploid cotton species [8]. A pan-genome including 10 representative Gossypium diploid genomes linked changes in chromatin structures to phenotypic differences in cotton fiber and identified regulatory variations that control the genetic basis of fiber length [2]. This study focused on one D5 genome, with less attention to other species of the d-genome. The recent article mainly delves into the evolutionary history and the mechanisms underlying the rapid adaptive radiation of Gossypium, with a particular focus on the roles of incomplete lineage sorting and gene flow [9]. Thus, the purpose of our study was to construct a pan-genome and carry out in-depth comparative genome analysis for diploid cotton including all species of d-genome and the other species of Gossypium, and perform the comparative genomic analysis for nectary development gene (GoNe) in different species of Gossypium [10]. In this study, we investigated 22 diploid cotton species and their wild relative Gossypioides kirkii (Mast.) J. B. Hutch. [11] with high-quality genomes publicly available in the CottonGen (https://www.cottongen.org/), IAGS among extant Gossypium species, and constructed a super pan-genome of cultivated and wild diploid cotton species. The structural variations (SVs) in different diploid cotton were examined and hotspot regions of SVs were detected. We investigated the presence or absence of foliar nectary in 17 diploid cotton species and carried out the comparative genomic analysis of GoNe in different species [10]. The 22 diploid cotton genomes were used for pan-genomic analysis, represented by seven of the eight recognized genome groups and 19 representative diploid cotton species, including its wild relative G. kirkii (Kirkii). The assembly completeness of each genome was evaluated with Benchmarking Universal Single-Copy Orthologs evaluation (Table S1). Transposable elements (TEs) were annotated to classify and assess their distribution in the cotton genomes. The most TEs were found in Gossypium rotundifolium Fryxell et al. (K2), and the fewest in wild relative Kirkii (Figure 1A; Figure S1; Table S2). There was a significant increase in the length of TEs from the D genome to the G and K genomes, and from the B, E, and F genomes to the A genome, suggesting that TEs may have exerted important influences on the evolution of cotton. There was a significant positive correlation between the proportions of TEs in total sequence per genome and assembly length, suggesting that the increase of TEs might contribute to genome amplification (Figure 1A; Figure S1; Table S2). Gypsy and Copia long terminal repeats were identified as significant contributors to the genome amplification process. The genetic relationships, evolution, and divergence time of 22 diploid cotton genomes were then analyzed using whole-genome sequencing. A maximum-likelihood phylogenetic tree was constructed using 352 single-copy coding genes, revealing two distinct clades with D genome diploid cotton species forming one clade. The cultivated diploid cotton diverged from the wild diploid cotton species about 5.45 Mya, and the divergence time between diploid cotton species and its wild relative G. kirkii was about 10.13 Mya (Figure 1B; Figure S2). Collinearity blocks between assembly genomes were identified, and 13 collinearity blocks between Gossypium herbaceum L. (A1) and Gossypium arboreum L. (A2) were disordered on different corresponding genomes, suggesting their importance in diploid cotton evolution (Figure 1C). The Gossypium ancestor genome was inferred using the genome median problem model, and chromosome fission and inversions were found to be fundamental forces for speciation. Huge chromosome inversions may drive species formation and diversity (Figure 1D; Figure S3). Future studies will investigate the causes of fission and whether it was caused by selection or upheaval environment. We then constructed a super pan-genome through evaluation of 22 genomes and gene annotations of diploid cotton species. The pan-genome contained 67,807 genes, including 22,384 core, 34,093 variable, and 11,330 specific genes (Figure 1E–G; Tables S3–5). Kyoto Encyclopedia of Genes and Genomes pathway and gene ontology enrichment analysis of core genes showed terms related to growth and development of cotton (Table S6). Specific genes were enriched for different terms, revealing variations in characteristics of different cotton species [2-4] (Figures S4–12; Tables S7–13). For instance, specific genes in G. herbaceum (A1) and G. arboreum (A2) were related to disease resistance and lint yield, respectively (Figures S4 and S5; Tables S8 and S9). Specific genes in Gossypium raimondii Ulbr. (D5) were associated with biomass, fiber quality, and stress/disease resistance (Figures S6 and S7; Table S10). The specific genes in Gossypium anomalum Waw. & Peyr. (B1) were linked to drought tolerance (Figure S11; Table S12) [3], corresponding to their species-specific characteristics. To overcome reference genome bias, SVs were identified in 22 diploid cotton assembly genomes using three reference genomes. Results showed differences in total number and types of SVs across cotton species (Figure 1H; Figures S13–16; Tables S14–17). Repeat contraction was most common and deletion least common. Gossypium armourianum Kearney (D2-1) had the greatest number of SVs and G. kirkii had the fewest. Wild D genomes had a 1.5-fold greater number of SVs than cultivated cotton species (A1 and A2 genomes), and there were no significant differences in SVs between wild (B1, E1, F1, and G2 genomes) and cultivated cotton species (A1 and A2 genomes) using K2 as reference (Figures S13 and S16; Table S14). Unevenly distributed SVs were identified in 321 SV hotspot regions, including 90 genes associated with fiber initiation and/or elongation (Figure S16; Table S18). A2 genome had fewer SVs in these regions, possibly explaining its high lint yield and fiber quality. This could be attributed to the fact that the genes associated with fiber initiation and/or elongation in the A2 genome were less impacted by SVs. Finally, foliar nectaries in Gossypium provide the plant with defense against herbivores [10], and phenotypic investigations were conducted on 17 cotton species, classified into seven diploid cotton groups, to determine the presence of foliar nectaries. No foliar nectary was found in Gossypium gossypiodes (Ulbr.) Standl. (D6), consistent with previous studies [12], Gossypium schwendimanii Fryxell & S. D. Koch (D11), and Gossypium tomentosum Nutt. ex Seem. ((AD)3), the allotetraploid cotton species (Figure 1I,M; Figure S17). Comparative genomic analysis of GoNe expression revealed no expression in D6 and D11, suggesting a lack of function of GoNe in preventing foliar nectary development in these two wild diploid species (Figure 1N; Figure S18; Table S19). Sequence analysis of the GoNe promoter sequences showed a large deletion (444-bp fragment) in the promoter sequences of GoNe from D6 and D11 species compared to other diploid cotton species with foliar nectaries (Figures S19 and S20). In conclusion, we constructed a high-quality super pan-genome of 22 diploid cotton species and investigated their adaptive evolution. The pan-genome contained 67,807 genes, including core, variable, and specific genes, and identified SVs and hotspot regions associated with fiber initiation and/or elongation. The study also investigated the absence of foliar nectary in G. gossypiodes and G. schwendimanii, and identified the deletion in the promoter sequence of GoNe as the cause. This study provides insights into the genetic diversity of diploid cotton and its dynamic genomic variation during expansion, which can aid modern cotton breeding. Xueqiang Wang: Conceptualization; methodology; investigation; formal analysis; funding acquisition; writing—original draft. Hejun Lu: Formal analysis; conceptualization; methodology; writing—review and editing. Yan Zhao: Conceptualization; methodology; formal analysis; writing—review and editing; funding acquisition. Zhiyuan Zhang: Formal analysis; conceptualization; methodology; writing—review and editing; funding acquisition. Jun Li: Conceptualization; writing—review and editing. Zeyu Dong: Investigation. Yupeng Hao: Investigation. Lei Fang: Conceptualization; writing—review and editing. Xueying Guan: Conceptualization; writing—review and editing. Ting Zhao: Investigation; writing—review and editing. Yan Hu: Conceptualization; writing—review and editing; supervision. Tianzhen Zhang: Conceptualization; supervision; writing—review and editing; funding acquisition. The present study was supported by the Project of Hainan Provincial Natural Science Foundation of China (323QN313), Hainan Yazhou Bay Seed Laboratory in Hainan Province (B21Y10402 & B22C10403), the China Postdoctoral Science Foundation (2022M722808), the Leading Innovative and Entrepreneur Team Introduction Program of Zhejiang (2019R01002), the Fundamental Research Funds for the Central Universities (226-2022-00100), the National Natural Science Foundation of China (NSFC32260179) and Natural Science Foundation of Shandong Province of China (ZR2020MC096). The authors declare no conflict of interest. No animals or humans were involved in this study. The assembled genomes of 27 diploid cotton and one phylogenetic outgroup species Gossypioides kirkii (Kirkii) were downloaded from the CottonGen (https://www.cottongen.org/) and the detailed information (DOI and URL) is included in Table S1. The plot code has been submitted to Github (https://github.com/xqwang1990/Cotton_Pangenome_Plot). Supporting Information (methods, figures, tables, scripts, graphical abstract, slides, videos, Chinese translated version, and update materials) may be found in the online DOI or iMeta Science https://www.imeta.science/imetaomics/. Figure S1: The types and numbers of transposable elements (TEs) and TE length/assembly length in different cotton species. The different colors of genomes represented different diploid cotton. Figure S2: Phylogenetic tree of twenty-two genomes using 352 single-copy coding genes with the phylogenetic outgroup species Gossypioides kirkii (Kirkii). Figure S3: Ancestor genome construction and dotplot based on the synteny blocks. Figure S4: Pathway (A) and Gene ontology (GO) (B) enrichment of specific genes unique in A1 genome. Figure S5: Pathway (A) and Gene ontology (GO) (B) enrichment of specific genes unique in A2 genomes. Figure S6: Pathway and gene ontology (GO) enrichment of specific genes unique in D5-502 genome. Figure S7: Gene ontology (GO) enrichment of specific genes unique in D5-4 (A) and D5-8 (B) genomes. Figure S8: Gene ontology (GO) enrichment of specific genes unique in D1-5 (A) and D8 (B) genomes. Figure S9: Gene ontology (GO) enrichment of specific genes unique in D3 (A) and D10 (B) genomes. Figure S10: Gene ontology (GO) enrichment of specific genes unique in B1 (A), E1 (B) and G2 (C) genomes. Figure S11: The biological process (A), cellular component (B) and molecular function (C) in gene ontology (GO) enrichment of specific genes unique in K2 genome. Figure S12: Pathway enrichment of specific genes unique in K2 genome. Figure S13: The types and numbers of SVs in different diploid cotton using the K2 reference. Figure S14: The types and numbers of SVs in different diploid cotton using the D5-502 reference. Figure S15: The types and numbers of SVs in different diploid cotton using the A2 reference. Figure S16: The distributions of SVs in different diploid cotton genomes. Figure S17: Investigation of foliar nectary in 17 diverse cotton species. Figure S18: The expression level of GoNe1 (A) and GoNe2 (B) in five diverse cotton species. Figure S19: Sequence differences of CDS from GoNe1 (for A subgroup) and GoNe2 (for D subgroup) in the diploid cotton species. Figure S20: Sequence differences of promoter from GoNe1 (for A subgroup) and GoNe2 (for D subgroup) in the diploid cotton species. Table S1: Information and assessment of twenty-three genomes. Table S2: The masked sequence and numbers of transposable elements and TE length/assembly length in different cotton species. Table S3: The IDs and information of genes in our pan-genome. Table S4: The gene PAVs in different diploid cottons. Table S5: The gene number in genome numbers of different diploid cotton. Table S6: Pathway and gene ontology (GO) enrichment of core genes. Table S7: The specific gene number to each assembly of diploid cotton. Table S8: Pathway and gene ontology (GO) enrichment of specific genes unique in A1 genome. Table S9: Pathway and gene ontology (GO) enrichment of specific genes unique in A2 genomes. Table S10: Pathway and gene ontology (GO) enrichment of specific genes unique in D5 genomes. Table S11: Pathway and gene ontology (GO) enrichment of specific genes unique in D genomes except D5. Table S12: Pathway and gene ontology (GO) enrichment of specific genes unique in other genomes except A and D genomes. Table S13: Pathway and gene ontology (GO) enrichment of specific genes unique in Kirkii genome. Table S14: The size range and number of SVs in different diploid cotton using the K2 reference. Table S15: The size range and number of SVs in different diploid cotton using the D5-502 reference. Table S16: The size range and number of SVs in different diploid cotton using the A2 reference. Table S17: Location of the detected SVs on the genome of 22 cotton species. Table S18: List of 321 SV hotspot regions and 90 genes associated with fiber initiation or/and elongation. Table S19: Oligonucleotides used for qRT-PCR in this study. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
What problem does this paper attempt to address?