Abstract:Background Genomic prediction aims to predict the breeding values of multiple complex traits, usually assumed to be normally distributed by the largely used statistical methods, thus imposing linear genetic correlations between traits. While statistical methods are of great value for genomic prediction, these methods do not account for nonlinear genetic relationships between traits. If such relationships exist, although statistical models do perform a fair linear approximation, their prediction accuracy is limited due to the nonlinearity. Deep learning (DL) is a promising methodology for predicting multiple complex traits, in scenarios where nonlinear genetic relationships are present, due to its capacity to capture complex and nonlinear patterns in large data. We proposed a novel hybrid DLGBLUP model which uses the output of the traditional GBLUP, and enhances its PGV by accounting for nonlinear genetic relationships between traits using DL. Using simulated data, we compared the accuracy of the PGV obtained with the proposed hybrid DLGBLUP model, a DL model, and the traditional GBLUP model, the latter being our baseline reference. Results We found that both DL and DLGBLUP models either outperformed GBLUP, or presented equally accurate PGV, with a particular greater accuracy for traits presenting a strongly characterized nonlinear genetic relationship. Overall, DLGBLUP presented the highest prediction accuracy, up to 0.2 points higher than GBLUP, and smallest mean squared error of the PGV for all traits. Additionally, we evolved a base population over seven generations and compared the genetic progress when selecting individuals based on the additive PGV obtained by either DL, DLGBLUP or GBLUP. For all traits with a nonlinear genetic relationship, after the fourth generation, the observed genetic gain when selection was based on the additive PGV from GBLUP was always inferior to the one achieved from either DL or DLGBLUP. Conclusions The integration of DL into genomic prediction enables the possibility of modeling nonlinear relationships between traits. Moreover, by identifying these nonlinear genetic relationships, our DL and DLGBLUP models improved prediction accuracy, when compared to GBLUP. The possibility of nonlinear relationships between traits offers a different perspective into multi-trait evaluations and prediction, as well as into the traits evolution over generations, with potential to further improve selection strategies in commercial livestock breeding programs. Moreover, DLGBLUP shows that DL can be used as a complement to statistical methods, by enhancing their performance.

Performance of deep-learning based approaches to improve polygenic scores

Deep learning for polygenic prediction: The role of heritability, interaction type and sample size

Modeling gene interactions in polygenic prediction via geometric deep learning

Risk factors affecting polygenic score performance across diverse cohorts

A Deep Learning-based Genome-wide Polygenic Risk Score for Common Diseases Identifies Individuals with Risk

Trait Imputation Enhances Nonlinear Genetic Prediction for Some Traits

A Novel Approach to Encode Two-Way Epistatic Interactions Between Single Nucleotide Polymorphisms

A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants

Deep neural network improves the estimation of polygenic risk scores for breast cancer

A Non-Parametric Method for Building Predictive Genetic Tests on High-Dimensional Data

A Machine-Learning Heuristic to Improve Gene Score Prediction of Polygenic Traits

Polygenic prediction and gene regulation networks

Recent advances in polygenic scores: translation, equitability, methods and FAIR tools

Deep Learning for Polygenic Risk Prediction

Deep Learning and GBLUP Integration: An Approach that Identifies Nonlinear Genetic Relationships Between Traits

A Penalized Linear Mixed Model for Genomic Prediction Using Pedigree Structures.

Genomic Prediction of Complex Disease Risk

PRS-Net: Interpretable polygenic risk scores via geometric learning

Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations

Deep learning-based polygenic risk analysis for Alzheimer's disease prediction

Optimization of Multi-Ancestry Polygenic Risk Score Disease Prediction Models