Confidence intervals for validation statistics with data truncation in genomic prediction

Matias Bermann,Andres Legarra,Alejandra Alvarez Munera,Ignacy Misztal,Daniela Lourenco
DOI: https://doi.org/10.1186/s12711-024-00883-w
2024-03-09
Genetics Selection Evolution
Abstract:Validation by data truncation is a common practice in genetic evaluations because of the interest in predicting the genetic merit of a set of young selection candidates. Two of the most used validation methods in genetic evaluations use a single data partition: predictivity or predictive ability (correlation between pre-adjusted phenotypes and estimated breeding values (EBV) divided by the square root of the heritability) and the linear regression (LR) method (comparison of "early" and "late" EBV). Both methods compare predictions with the whole dataset and a partial dataset that is obtained by removing the information related to a set of validation individuals. EBV obtained with the partial dataset are compared against adjusted phenotypes for the predictivity or EBV obtained with the whole dataset in the LR method. Confidence intervals for predictivity and the LR method can be obtained by replicating the validation for different samples (or folds), or bootstrapping. Analytical confidence intervals would be beneficial to avoid running several validations and to test the quality of the bootstrap intervals. However, analytical confidence intervals are unavailable for predictivity and the LR method.
genetics & heredity,agriculture, dairy & animal science
What problem does this paper attempt to address?