Abstract:With the rapid development of animal phenomics and deep phenotyping, we can get thousands of traditional but also molecular phenotypes per individual. However, there is still a lack of exploration regarding how to handle this huge amount of data in the context of animal breeding, presenting a challenge that we are likely to encounter more and more in the future. This study aimed to (1) explore the use of the Mega-scale linear mixed model (MegaLMM), a factor model-based approach, able to simultaneously estimate (co)variance components and genetic parameters in the context of thousands of milk traits, hereafter called thousand-trait (TT) models; (2) compare the phenotype values and genomic breeding values (u) predictions for focal traits (i.e., traits that are targeted for prediction, compared with secondary traits that are helping to evaluate), from single-trait (ST) and TT models, respectively; (3) propose a new approximate method of estimated genomic breeding values (U) prediction with TT models and MegaLMM. 3,421 milk mid-infrared (MIR) spectra wavepoints (called secondary traits) and 3 focal traits [average fat percent (Fat), average methane (CH4), and average somatic cell score (SCS)] collected on 3,302 first-parity Holstein cows were used. The 3,421 milk MIR wavepoints traits were composed of 311 wavepoints in 11 classes (months in lactation). Genotyping information of 564,439 SNP was available for all animals and was used to calculate the genomic relationship matrix. The MegaLMM was implemented in the framework of the Bayesian sparse factor model and solved through Gibbs sampling (Markov chain Monte Carlo). The heritabilities of the studied 3,421 milk MIR wavepoints gradually increased and then decreased in units of 311 wavepoints throughout the lactation. The genetic and phenotypic correlations between the first 311 wavepoints and the other 3,110 wavepoints were low. The accuracies of phenotype predictions from the ST model were lower than those from the TT model for Fat (0.51 vs. 0.93), CH4 (0.30 vs. 0.86), and SCS (0.14 vs. 0.33). The same trend was observed for the accuracies of u predictions: Fat (0.59 vs. 0.86), CH4 (0.47 vs. 0.78), and SCS (0.39 vs. 0.59). The average correlation between U predicted from the TT model and the new approximate method was 0.90. The new approximate method used for estimating U in MegaLMM will enhance the suitability of MegaLMM for applications in animal breeding. This study conducted an initial investigation into the application of thousands of traits in animal breeding and showed that the TT model is beneficial for the prediction of focal traits (phenotype and breeding values), especially for difficult-to-measure traits (e.g., CH4).

Prediction of representative phenotypes using Multi-Attribute Subset Selection

Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations

Phenotype Prediction using a Tensor Representation and Deep Learning from Data Independent Acquisition Mass Spectrometry

Pleiotropic genetic association analysis with multiple phenotypes using multivariate response best-subset selection

MMAP: a cloud computing platform for mining the maximum accuracy of predicting phenotypes from genotypes

Predicting gene phenotype by multi-label multi-class model based on essential functional features

Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data

Multitask Knowledge-primed Neural Network for Predicting Missing Metadata and Host Phenotype based on Human Microbiome

A Novel Feature Ensemble Technology To Improve Prediction Performance Of Multiple Heterogeneous Phenotypes Based On Microarray Data

Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes

Megavariate Methods Capture Complex Genotype-by-Environment Interactions

Using drone-retrieved multispectral data for phenomic selection in potato breeding

A Population Genetics Model of Marker-Assisted Selection

Multi-Species Prediction of Physiological Traits with Hyperspectral Modeling

MicroHDF: predicting host phenotypes with metagenomic data using a deep forest-based framework

Exploring a Bayesian sparse factor model-based strategy for the genetic analysis of thousands of MIR-spectra traits for animal breeding

Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data

A supervised Bayesian factor model for the identification of multi-omics signatures

Prediction and association mapping of agronomic traits in maize using multiple omic data

MegaLMM improves genomic predictions in new environments using environmental covariates

Dual-extraction modeling: A multi-modal deep-learning architecture for phenotypic prediction and functional gene mining of complex traits