Machine learning algorithms translate big data into predictive breeding accuracy

José Crossa,Osval A Montesinos-Lopez,Germano Costa-Neto,Paolo Vitale,Johannes W R Martini,Daniel Runcie,Roberto Fritsche-Neto,Abelardo Montesinos-Lopez,Paulino Pérez-Rodríguez,Guillermo Gerard,Susanna Dreisigacker,Leonardo Crespo-Herrera,Carolina Saint Pierre,Morten Lillemo,Jaime Cuevas,Alison Bentley,Rodomiro Ortiz
DOI: https://doi.org/10.1016/j.tplants.2024.09.011
2024-10-26
Abstract:Statistical machine learning (ML) extracts patterns from extensive genomic, phenotypic, and environmental data. ML algorithms automatically identify relevant features and use cross-validation to ensure robust models and improve prediction reliability in new lines. Furthermore, ML analyses of genotype-by-environment (G×E) interactions can offer insights into the genetic factors that affect performance in specific environments. By leveraging historical breeding data, ML streamlines strategies and automates analyses to reveal genomic patterns. In this review we examine the transformative impact of big data, including multi-trait genomics, phenomics, and environmental covariables, on genomic-enabled prediction in plant breeding. We discuss how big data and ML are revolutionizing the field by enhancing prediction accuracy, deepening our understanding of G×E interactions, and optimizing breeding strategies through the analysis of extensive and diverse datasets.
What problem does this paper attempt to address?