PacBio Sequencing Reveals Transposable Elements As a Key Contributor to Genomic Plasticity and Virulence Variation in Magnaporthe Oryzae

Jiandong Bao,Meilian Chen,Zhenhui Zhong,Wei Tang,Lianyu Lin,Xingtan Zhang,Haolang Jiang,Deyu Zhang,Chenyong Miao,Haibao Tang,Jisen Zhang,Guodong Lu,Ray Ming,Justice Norvienyeku,Baohua Wang,Zonghua Wang
DOI: https://doi.org/10.1016/j.molp.2017.08.008
IF: 27.5
2017-01-01
Molecular Plant
Abstract:The sustainable cultivation of rice, which serves as staple food crop for more than half of the world's population, is under serious threat due to the huge yield losses inflicted by rice blast disease caused by the globally destructive fungus Magnaporthe oryzae (Pyricularia oryzae) (Dean et al., 2012Dean R. Van Kan J.A. Pretorius Z.A. Hammond-Kosack K.E. Di Pietro A. Spanu P.D. Rudd J.J. Dickman M. Kahmann R. Ellis J. et al.The top 10 fungal pathogens in molecular plant pathology.Mol. Plant Pathol. 2012; 13: 414-430Crossref PubMed Scopus (2155) Google Scholar, Nalley et al., 2016Nalley L. Tsiboe F. Durand-Morat A. Shew A. Thoma G. Economic and environmental impact of rice blast pathogen (Magnaporthe oryzae) alleviation in the United States.PLoS One. 2016; 11: e0167295Crossref PubMed Scopus (114) Google Scholar, Deng et al., 2017Deng Y. Zhai K. Xie Z. Yang D. Zhu X. Liu J. Wang X. Qin P. Yang Y. Zhang G. et al.Epigenetic regulation of antagonistic receptors confers rice blast resistance with yield balance.Science. 2017; 355: 962-965Crossref PubMed Scopus (311) Google Scholar). This filamentous ascomycete fungus is also capable of causing blast infection on other economically important cereal crops, including wheat, millet, and barley, making it the world's most important plant pathogenic fungus (Zhong et al., 2016Zhong Z.H. Norvienyeku J. Chen M.L. Bao J.D. Lin L.Y. Chen L.Q. Lin Y.H. Wu X.X. Cai Z.N. Zhang Q. et al.Directional selection from host plants is a major force driving host specificity in Magnaporthe species.Sci. Rep. 2016; 6: 25591Crossref PubMed Scopus (29) Google Scholar). The advent of whole-genome sequencing technology and the subsequent deployment of next-generation sequencing (NGS) strategies have successfully generated genome assemblies for over 50 isolates of M. oryzae, which have played an instrumental role in enhancing our understanding of how rice blast fungus undertakes host adaptation, host specificity, and host range expansion to overcome host resistance (Dean et al., 2005Dean R.A. Talbot N.J. Ebbole D.J. Farman M.L. Mitchell T.K. Orbach M.J. Thon M. Kulkarni R. Xu J.R. Pan H. et al.The genome sequence of the rice blast fungus Magnaporthe grisea.Nature. 2005; 434: 980-986Crossref PubMed Scopus (1185) Google Scholar, Xue et al., 2012Xue M. Yang J. Li Z. Hu S. Yao N. Dean R.A. Zhao W. Shen M. Zhang H. Li C. et al.Comparative analysis of the genomes of two field isolates of the rice blast fungus Magnaporthe oryzae.PLoS Genet. 2012; 8: e1002869Crossref PubMed Scopus (102) Google Scholar, Wu et al., 2015Wu J. Kou Y.J. Bao J.D. Li Y. Tang M.Z. Zhu X.L. Ponaya A. Xiao G. Li J.B. Li C.Y. et al.Comparative genomics identifies the Magnaporthe oryzae avirulence effector AvrPi9 that triggers Pi9-mediated blast resistance in rice.New Phytol. 2015; 206: 1463-1475Crossref PubMed Scopus (111) Google Scholar, Zhang et al., 2016Zhang H. Zheng X. Zhang Z. The Magnaporthe grisea species complex and plant pathogenesis.Mol. Plant Pathol. 2016; 17: 796-804Crossref PubMed Scopus (62) Google Scholar). However, research findings obtained from comparative genomic studies conducted using the NGS-assembled genome do not present an in-depth account of the genomic features that contribute to the prevailing genomic variations among M. oryzae species, because NGS assemblies are highly fragmented and lack most of the lineage-specific (LS) regions, which are more plastic than the core genome and enriched with repeats and effector proteins (Raffaele and Kamoun, 2012Raffaele S. Kamoun S. Genome evolution in filamentous plant pathogens: why bigger can be better.Nat. Rev. Microbiol. 2012; 10: 417-430Crossref PubMed Scopus (458) Google Scholar, Faino et al., 2016Faino L. Seidl M.F. Shi-Kunne X. Pauper M. van den Berg G.C. Wittenberg A.H. Thomma B.P. Transposons passively and actively contribute to evolution of the two-speed genome of a fungal pathogen.Genome Res. 2016; 26: 1091-1100Crossref PubMed Scopus (149) Google Scholar). Accumulating evidence has shown that transposable elements (TEs) play a crucial role in driving genomic plasticity and pathogenicity variation among many plant pathogenic fungi (Raffaele and Kamoun, 2012Raffaele S. Kamoun S. Genome evolution in filamentous plant pathogens: why bigger can be better.Nat. Rev. Microbiol. 2012; 10: 417-430Crossref PubMed Scopus (458) Google Scholar, Faino et al., 2016Faino L. Seidl M.F. Shi-Kunne X. Pauper M. van den Berg G.C. Wittenberg A.H. Thomma B.P. Transposons passively and actively contribute to evolution of the two-speed genome of a fungal pathogen.Genome Res. 2016; 26: 1091-1100Crossref PubMed Scopus (149) Google Scholar). The genome-wide influence of TEs in promoting virulence evolution among M. oryzae species has not been well investigated, largely because the current Sanger sequencing-derived reference genome is from the laboratory strain 70-15, which was generated from a cross between rice isolate Guy11 and a weeping love grass (Eragrostis curvula) isolate. The composition and distribution of TEs may differ greatly from that in rice-infecting isolates and may not provide an account of the genome-wide influence of TEs on virulence evolution of M. oryzae field isolates. Furthermore, genome assemblies of other naturally occurring M. oryzae field isolates generated with NGS platforms do not support the determination of full-length TEs owing to short reads. To address these limitations, we deployed single-molecule real-time sequencing developed by Pacific BioSciences (PacBio) to generate near-complete genome assembly for M. oryzae field isolates FJ81278 and Guy11, and evaluated the possible contribution of TEs to genomic variation events such as chromosomal translocation, gene presence/absence in LS regions, and virulence-associated secreted proteins (SPs) polymorphism. We showed that PacBio sequencing significantly improved the quality of the genome assembly. Compared with Illumina-based short-read assembly, PacBio long-read assembly yielded ∼10% larger genome size, produced ∼20-fold increased contig N50 (4.13 Mb/0.156 Mb and 3.28 Mb/0.18 Mb), and resulted in >95% reduction in genome fragmentation (54/1415 and 56/1182) for FJ81278 and Guy11, respectively (Supplemental Table 1 and Supplemental Figure 1). We also showed that approximately 98% of the PacBio-assembled contigs were longer than 100 kb (Supplemental Table 1 and Supplemental Figure 1). Our investigations also showed that the increased size of the PacBio-assembled genome was not accompanied by a corresponding increase in the number of new genes but was as a result of a significant increase in the recovery of repeat sequences (Supplemental Table 1 and Supplemental Figure 4A). PacBio sequencing assembly for FJ81278 and Guy11 isolates recorded an additional 239 and 149 genes compared with the number of genes recorded for the respective isolates under Illumina sequencing assembly (Supplemental Table 1). Our analysis, however, revealed that a significant proportion of the increase in the number of genes recorded in the PacBio-assembled genome for the respective isolates was due to higher incidence of gene duplication (Supplemental Table 2). We further performed de novo TE prediction and identified seven new TEs, consisting of one DNA transposon, three LTR retro-transposons, and three unknown classes (Supplemental Figure 2 and Supplemental Table 3). The proportion of TEs obtained from PacBio assembly was ∼10% more than the number recorded with Illumina-mediated assemblies (Supplemental Tables 1 and 3). Among the TE families in the PacBio assembly, LTR elements occupied about half (∼7% versus ∼14%); DNA elements and LINEs shared ∼30% (∼4% versus ∼14%) (Supplemental Table 3). In addition, we observed that the SPs are concentrated in LS regions of the genome (LS regions refer to highly dynamic regions among isolates) (6.67%–14.75% versus 5.3%) at the whole-genome level in the three isolates (Supplemental Table 4). To further confirm this result, an additional 60 isolates (Supplemental Table 5) were used to clear the boundary line between the LS regions and isolate-specific regions (isolate-specific regions refer to genomic regions that exist uniquely in a specific isolate). From this examination, we noticed that the proportion of SPs was 2-fold (26%/13%) higher than non-secreted proteins (non-SPs) in the LS region, and no SPs were located in isolate-specific regions (Figure 1F). Further chromosome distribution analysis also showed that the peak of SPs located in LS regions are also enriched with TEs (Figure 1B–1D). However, the gene-gene distance of SPs was slightly larger (but not significant) than that of non-SPs (Supplemental Figure 4B). Additional TE-association analysis in a ±1 kb gene flanking region revealed that the percentage of TE-associated SPs is ∼3-fold higher than the percentage of TE-associated non-SPs (Figure 1G). We further noticed that within these associations, Pot2, rnd-2_family-240, MGR583, Mg-SINE, and Pot3 were the top five kinds of TEs. Also the level of Pot3 type TEs associated with SPs was ∼2-fold higher than the level recorded in non-SPs (P = 0.024, paired t-test; Supplemental Figure 3). Our search results identified 264 core and non-TE-associated SPs (∼40%), and most of them were hypothetical proteins (Supplemental Tables 6 and 7). Further domain and functional annotation analysis conducted in this study showed that 43 of these core and non-TE SPs contain functional domains that are known to be associated with virulence and pathogenesis of microbial pathogens (Supplemental Table 6). This examination also revealed that 30 of these core SPs constitute 14 paralog groups (Supplemental Table 6). We also observed that some of the non-core and TE-associated SPs identified in this study are avirulent genes such as AvrPib and AvrPiz-t (Supplemental Table 5). To evaluate the quality of PacBio-assembled genomes, we conducted gap filling for the 70-15 reference genome. Our gap-filling analysis results showed that 94 and 84 gaps in 70-15 were successfully filled by Guy11 and FJ81278, respectively, and a total of 110 (110/157; ∼70%) (Supplemental Tables 8 and 9), >2-fold higher than previous gap-filling results obtained with Illumina-based assemblies (Xue et al., 2012Xue M. Yang J. Li Z. Hu S. Yao N. Dean R.A. Zhao W. Shen M. Zhang H. Li C. et al.Comparative analysis of the genomes of two field isolates of the rice blast fungus Magnaporthe oryzae.PLoS Genet. 2012; 8: e1002869Crossref PubMed Scopus (102) Google Scholar). We also conducted whole-genome alignment with the PacBio assemblies against the reference genome. Our analysis showed that most contigs were identical with the reference genome and ran from end to end in a continuous main diagonal pattern. Interestingly, we observed that chromosome 5 in FJ81278 has only one contig (Figure 1H). These data confirmed that PacBio assemblies have sufficient genome coverage and superior integrity. Our investigation also revealed the existence of one intra-chromosomal inversion in chromosome 6 of FJ81278 and two other large-scale inter-chromosomal rearrangements occurring between chromosome 1 and 6 of FJ81278 and another one occurring between chromosome 1 and 4 of Guy11 (Figure 1H). We confirmed the existence of large-scale structure variations (SV) by conducting long-read mapping. Compared with the reference genome, all types of SV identified in the FJ81278 were much higher than the number recorded in Guy11. A total of 500 SVs consisting of 338 deletions (DEL), 37 insertions (INS), 62 inversions (INV), and 63 translocations (TRA) were detected in FJ81278/70-15, while a total of 108 SVs consisting of 75 DELs, 12 INSs, 9 INVs, and 11 TRAs were detected in Guy11/70-15 (Supplemental Figure 5). We further examined the large-scale inter-chromosomal rearrangement (Chr1/Chr6) observed in FJ81278 by analyzing the data generated from long-read mapping (Figure 1I). Interestingly, we identified two clustered mobile elements (MAGGY and MGRL3, ∼6 kb) located at the center of the break point. This observation adequately suggested the existence of TE-mediated inter-chromosomal rearrangement events in M. oryzae. In summary, we used PacBio sequencing technology to perform de novo whole-genome sequencing of field isolates FJ81278 and Guy11. PacBio sequencing significantly improved genome quality and yielded a larger (∼10%) genome size than the reference genome, filling ∼70% gaps in the reference genome. Compared with Illumina assemblies, PacBio genomes yielded a 20-times increase in N50, accounted for ∼95% reduction in contig numbers, and ∼10% increase in repeats. Importantly, PacBio sequencing assembly revealed that TEs play a key role in regulating genomic plasticity, promoting chromosome rearrangement and the presence/absence polymorphism of SP genes. This work was supported by the National Natural Science Foundation of China (U1305211 and 91231121 to Z.W. and 31301621 to J.B.), the National Key Research and Development Program of China (2016YFD0300700 to Z.W.), and Science Fund for Distinguished Young Scholars of Fujian Agriculture and Forestry University to J.B. (XJQ201511).
What problem does this paper attempt to address?