Optimizing Genomic Selection Methods to Improve Prediction Accuracy of Sugarcane Single-Stalk Weight
Zihao Wang,Chengcai Xia,Yanjie Lu,Qi Liu,Meiling Zou,Fenggang Zan,Zhiqiang Xia
DOI: https://doi.org/10.3390/agronomy14122842
2024-01-01
Agronomy
Abstract:Sugarcane (Saccharum spp. Hybrids), serving as a vital sugar and energy crop, holds immense development potential on a global scale. In the process of sugarcane breeding and variety improvement, single-stalk weight stands as a crucial selection criterion. By cultivating sugarcane varieties with heavier single stalks, robust growth, high yields, and superior quality, the planting efficiency and market competitiveness of sugarcane can be further enhanced. Single-stalk weight was determined by measuring individual stalks three times in the field, calculating the average value as the phenotypic expression. The distribution of single-stalk weights in the orthogonal and reciprocal populations revealed coefficients of variation of 19.3% and 17.7%, respectively, with the reciprocal population showing greater genetic stability. After rigorous filtering of Hyper_seq_FD sequencing data from 409 sugarcane samples, we identified 31,204 high-quality single-nucleotide polymorphisms (SNPs) evenly distributed across all 32 chromosomes, providing a comprehensive representation of the sugarcane genome. In this study, we evaluated the predictive performance of various genomic selection (GS) methods for single-stalk weight in the 299 orthogonal population, with the male parent being GZ_73-204 and the female parent being GZ_P72-1210, and in the 108 reciprocal population, with the male parent being GZ_P72-1210 and the female parent being GZ_73-204. Initially, we compared the performance of five prediction approaches, including genomic best linear unbiased prediction (GBLUP), single-step genomic best linear unbiased prediction (SSBLUP), Bayes A, machine learning (ML), and deep learning (DL) approaches. The results showed that the GBLUP model had the highest prediction accuracy, at 0.35, while the deep learning model had the lowest accuracy, at 0.20. To improve prediction accuracy, we assigned different scores to various regions of the sugarcane genome based on gene annotation information, thereby giving different weights to SNPs located in these regions. Additionally, we incorporated inbred and outbred populations as fixed effects into the model. The optimized SSBLUP model achieved a prediction accuracy of 0.44, which was a 17% improvement over the original SSBLUP model and a 9% increase compared to the originally optimal GBLUP model. The research results indicate that it is crucial to fully consider genomic structural regions, population structure characteristics, and fixed effects in GS predictions.