A Modified Bayesian Optimization Approach for Determining a Training Set to Identify the Best Genotypes from a Candidate Population in Genomic Selection

Hui-Ning Tu,Chen-Tuo Liao
DOI: https://doi.org/10.1007/s13253-024-00632-y
2024-06-21
Journal of Agricultural Biological and Environmental Statistics
Abstract:Training set optimization is a crucial factor affecting the probability of success for plant breeding programs using genomic selection. Conventionally, the training set optimization is developed to maximize Pearson's correlation between true breeding values and genomic estimated breeding values for a testing population, because it is an essential component of genetic gain in plant breeding. However, many practical breeding programs aim to identify the best genotypes for target traits in a breeding population. A modified Bayesian optimization approach is therefore developed in this study to construct training sets for tackling such an interesting problem. The proposed approach is based on Monte Carlo simulation and data cross-validation, which is shown to be competitive with the existing methods developed to achieve the maximal Pearson's correlation. Four real genome datasets, including two rice, one wheat, and one soybean, are analyzed in this study. An R package is generated to facilitate the application of the proposed approach. Supplementary materials accompanying this paper appear online.
statistics & probability,mathematical & computational biology,biology
What problem does this paper attempt to address?