Abstract:Historical data collection for genetic evaluation purposes is a common practice in animal populations; however, the larger the dataset, the higher the computing power needed to perform the analyses. Also, fitting the same model to historical and recent data may be inappropriate. Data truncation can reduce the number of equations to solve, consequently decreasing computing costs; however, the large volume of genotypes is responsible for most of the increase in computations. This study aimed to assess the impact of removing genotypes along with phenotypes and pedigree on the computing performance, reliability, and inflation of genomic predicted breeding value (GEBV) from single-step genomic best linear unbiased predictor for selection candidates. Data from two pig lines, a terminal sire (L1) and a maternal line (L2), were analyzed in this study. Four analyses were implemented: growth and "weaning to finish" mortality on L1, pre-weaning and reproductive traits on L2. Four genotype removal scenarios were proposed: removing genotyped animals without phenotypes and progeny (noInfo), removing genotyped animals based on birth year (Age), the combination of noInfo and Age scenarios (noInfo + Age), and no genotype removal (AllGen). In all scenarios, phenotypes were removed, based on birth year, and three pedigree depths were tested: two and three generations traced back and using the entire pedigree. The full dataset contained 1,452,257 phenotypes for growth traits, 324,397 for weaning to finish mortality, 517,446 for pre-weaning traits, and 7,853,629 for reproductive traits in pure and crossbred pigs. Pedigree files for lines L1 and L2 comprised 3,601,369 and 11,240,865 animals, of which 168,734 and 170,121 were genotyped, respectively. In each truncation scenario, the linear regression method was used to assess the reliability and dispersion of GEBV for genotyped parents (born after 2019). The number of years of data that could be removed without harming reliability depended on the number of records, type of analyses (multitrait vs. single trait), the heritability of the trait, and data structure. All scenarios had similar reliabilities, except for noInfo, which performed better in the growth analysis. Based on the data used in this study, considering the last ten years of phenotypes, tracing three generations back in the pedigree, and removing genotyped animals not contributing own or progeny phenotypes, increases computing efficiency with no change in the ability to predict breeding values.

Confidence intervals for validation statistics with data truncation in genomic prediction

Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods

Coronal breadth of human primary anterior teeth.

Boundaries for genotype, phenotype, and pedigree truncation in genomic evaluations in pigs

Overview of model validation for survival regression model with competing risks using melanoma study data

Testing for an ignorable sampling bias under random double truncation

Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor

Using Genetic Distance to Infer the Accuracy of Genomic Prediction

Use of the linear regression method to evaluate population accuracy of predictions from non-linear models

A novel meta-cleavage dioxygenase that cleaves a carboxyl-group-substituted 2-aminophenol. Purification and characterization of 4-amino-3-hydroxybenzoate 2,3-dioxygenase from Bordetella sp. strain 10d.

Health care reform and comparative effectiveness: implications for surgeons.

Effect of training-sample size and classification difficulty on the accuracy of genomic predictors

Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction

Impact of selective genotyping in the training population on accuracy and bias of genomic selection

Comparing algorithms to approximate accuracies for single-step genomic best linear unbiased predictor

Regularized quantile regression applied to genome-enabled prediction of quantitative traits

A Penalized Regression Method for Genomic Prediction Reduces Mismatch between Training and Testing Sets

Genomic Prediction Enhanced Sparse Testing for Multi-environment Trials

Factors Affecting the Accuracy of Genomic Prediction in Joint Pig Populations.

Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data

Cross-validation: what does it estimate and how well does it do it?