A new reference genome for Sorghum bicolor reveals high levels of sequence similarity between sweet and grain genotypes: implications for the genetics of sugar metabolism

Elizabeth A. Cooper,Zachary W. Brenton,Barry S. Flinn,Jerry Jenkins,Shengqiang Shu,Dave Flowers,Feng Luo,Yunsheng Wang,Penny Xia,Kerrie Barry,Chris Daum,Anna Lipzen,Yuko Yoshinaga,Jeremy Schmutz,Christopher Saski,Wilfred Vermerris,Stephen Kresovich
DOI: https://doi.org/10.1186/s12864-019-5734-x
IF: 4.547
2019-05-27
BMC Genomics
Abstract:<h3>Background</h3><p>The process of crop domestication often consists of two stages: initial domestication, where the wild species is first cultivated by humans, followed by diversification, when the domesticated species are subsequently adapted to more environments and specialized uses. Selective pressure to increase sugar accumulation in certain varieties of the cereal crop <em>Sorghum bicolor</em> is an excellent example of the latter; this has resulted in pronounced phenotypic divergence between sweet and grain-type sorghums, but the genetic mechanisms underlying these differences remain poorly understood.</p><h3>Results</h3><p>Here we present a new reference genome based on an archetypal sweet sorghum line and compare it to the current grain sorghum reference, revealing a high rate of nonsynonymous and potential loss of function mutations, but few changes in gene content or overall genome structure. We also use comparative transcriptomics to highlight changes in gene expression correlated with high stalk sugar content and show that changes in the activity and possibly localization of transporters, along with the timing of sugar metabolism play a critical role in the sweet phenotype.</p><h3>Conclusions</h3><p>The high level of genomic similarity between sweet and grain sorghum reflects their historical relatedness, rather than their current phenotypic differences, but we find key changes in signaling molecules and transcriptional regulators that represent new candidates for understanding and improving sugar metabolism in this important crop.</p>
genetics & heredity,biotechnology & applied microbiology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to understand the genetic mechanisms of the differences in sugar metabolism between sweet sorghum and grain sorghum. Specifically, the researchers hope to reveal the following issues: 1. **Differences at the genomic level**: Are there significant differences in genomic structure and gene content between sweet sorghum and grain sorghum? How do these differences affect their phenotypic characteristics, especially the sugar - accumulation ability? 2. **Differences at the transcriptomic level**: How do the gene expression patterns in sweet sorghum and grain sorghum differ in different developmental stages and tissues? Are these differences related to sugar accumulation? 3. **Functional changes of key genes**: Which genes or gene families have undergone functional changes (such as mutation, loss, or differential expression) in sweet sorghum, and how do these changes affect sugar transport and metabolism? ### Research background Sorghum (Sorghum bicolor) is a widely - planted crop. Through domestication and diverse selection, different varieties have been formed, some of which are used for grain production, and others are used for extracting sugar from the stalks. Sweet sorghum can accumulate a large amount of soluble sugar in its stalks, while grain sorghum is mainly used for food production. Although there are obvious differences in phenotype between the two, their genetic mechanisms are not fully understood. ### Main research results 1. **Genome comparison**: - By comparing the typical sweet sorghum strain "Rio" with the grain sorghum reference genome "BTx623", the researchers found that the genomic structures and gene contents of the two are highly similar, but there are a large number of non - synonymous mutations and potential loss - of - function mutations. - Although the number of genes and the overall structure are similar, obvious amplification and contraction were found in some large gene families (such as disease - resistant protein kinases and transcriptional regulators). 2. **Transcriptome analysis**: - Comparing the gene expression patterns of sweet sorghum and non - sweet sorghum in different developmental stages and tissues, it was found that sweet sorghum accumulates more sugar in the stalks, and the expression patterns of its sugar - metabolism - related genes are significantly different from those of grain sorghum. - In particular, several sucrose transporters (such as SUT4, SWEET3 - 3, and SWEET8 - 2) are completely absent or severely truncated in sweet sorghum, which may be one of the key factors leading to sugar accumulation in sweet sorghum. 3. **Functional verification of key genes**: - The researchers found that some genes related to sugar metabolism (such as SIP2, SWEET3 - 6, etc.) have missense mutations in sweet sorghum, and the expression patterns of these genes are closely related to sugar accumulation. - These gene changes may regulate the sugar - accumulation ability of sweet sorghum by affecting the sugar - transport and - metabolism pathways. ### Conclusion This study shows that although sweet sorghum and grain sorghum are highly similar in genomic structure, the differences in gene expression and function between them may be the key factors leading to differences in sugar accumulation. The research results provide important clues for understanding the genetic basis of sorghum sugar metabolism and new candidate genes for further improving sorghum varieties to increase sugar production. ### Formula representation The formulas involved in this paper are mainly concentrated in the statistical analysis of gene expression levels, for example: - The trend of Brix value over time can be represented by a linear regression model: \[ Brix=\beta_0 + \beta_1\cdot Time+\epsilon \] where \(Brix\) represents the soluble sugar concentration, \(Time\) represents time, \(\beta_0\) and \(\beta_1\) are regression coefficients, and \(\epsilon\) is an error term. - The significance test of differentially expressed genes can use the generalized linear model (GLM): \[ Y_{ij}\sim N(\mu_{ij},\sigma^2) \] where \(Y_{ij}\) represents the expression level of the \(i\)-th gene in the \(j\)-th sample, \(\mu_{ij}\) is the mean, and \(\sigma^2\) is the variance. These formulas help to explain the biological significance behind the data and provide theoretical support for subsequent research.