Assembly of the 373k Gene Space of the Polyploid Sugarcane Genome Reveals Reservoirs of Functional Diversity in the World's Leading Biomass Crop

Glaucia Mendes Souza,Marie-Anne Van Sluys,Carolina Gimiliani Lembke,Hayan Lee,Gabriel Rodrigues Alves Margarido,Carlos Takeshi Hotta,Jonas Weissmann Gaiarsa,Augusto Lima Diniz,Mauro de Medeiros Oliveira,Savio de Siqueira Ferreira,Milton Yutaka Nishiyama,Felipe ten-Caten,Geovani Tolfo Ragagnin,Pablo de Morais Andrade,Robson Francisco de Souza,Gianlucca Goncalves Nicastro,Ravi Pandya,Changsoo Kim,Hui Guo,Alan Mitchell Durham,Monalisa Sampaio Carneiro,Jisen Zhang,Xingtan Zhang,Qing Zhang,Ray Ming,Michael C. Schatz,Bob Davidson,Andrew H. Paterson,David Heckerman
DOI: https://doi.org/10.1093/gigascience/giz129
IF: 7.658
2019-01-01
GigaScience
Abstract:ABSTRACT Background Sugarcane cultivars are polyploid interspecific hybrids of giant genomes, typically with 10–13 sets of chromosomes from 2 Saccharum species. The ploidy, hybridity, and size of the genome, estimated to have >10 Gb, pose a challenge for sequencing. Results Here we present a gene space assembly of SP80-3280, including 373,869 putative genes and their potential regulatory regions. The alignment of single-copy genes in diploid grasses to the putative genes indicates that we could resolve 2–6 (up to 15) putative homo(eo)logs that are 99.1% identical within their coding sequences. Dissimilarities increase in their regulatory regions, and gene promoter analysis shows differences in regulatory elements within gene families that are expressed in a species-specific manner. We exemplify these differences for sucrose synthase (SuSy) and phenylalanine ammonia-lyase (PAL), 2 gene families central to carbon partitioning. SP80-3280 has particular regulatory elements involved in sucrose synthesis not found in the ancestor Saccharum spontaneum. PAL regulatory elements are found in co-expressed genes related to fiber synthesis within gene networks defined during plant growth and maturation. Comparison with sorghum reveals predominantly bi-allelic variations in sugarcane, consistent with the formation of 2 “subgenomes” after their divergence ∼3.8–4.6 million years ago and reveals single-nucleotide variants that may underlie their differences. Conclusions This assembly represents a large step towards a whole-genome assembly of a commercial sugarcane cultivar. It includes a rich diversity of genes and homo(eo)logous resolution for a representative fraction of the gene space, relevant to improve biomass and food production.
What problem does this paper attempt to address?