An Ensemble Learning Approach for Predicting Phenotypes from Genotypes

Tingxi Yu,Wuping Zhang,Jiwan Han,Fuzhong Li,Zhihong Wang,Chunqing Cao
DOI: https://doi.org/10.1109/iucc-cit-dsci-smartcns55181.2021.00068
2021-01-01
Abstract:Genomic selection (GS) refers to a new breeding strategy that estimates breeding values through high-density markers covering the whole genome, and then sorts and selects them. Recently years, breeders have been working on how to optimize models to improve the precision of genomic predictions. With the prompt development of artificial intelligence technology, machine learning algorithms are used for the genomic selection increasingly. Whereas, the prediction ability of a single machine learning algorithm in GS is unsatisfactory. In the study, we constructed an ensemble learning-based Genomic Prediction model (ELGP), integrating eight machine learning methods, for predicting phenotypes from genotypes. The experimental results demonstrate that ELGP methods outperformed other eight base learners. For the milk yield (MY) traits, milk fat percentage (MFP) traits and somatic cell score (SCS) traits, the Pearson's correlation coefficient of ELGP than the average value of eight base learners improved 12.14%, 14.99%, 15.56%, respectively, and the Radius index of ELGP than the average value of eight base learners reduced 15.11%, 13.72% and 18.29%, respectively. Except for the SCS traits, ELGP was more robust than BRR in all remaining traits. Therefore, the ELGP model has great potential to enhance prediction ability in other animals and plants. Moreover, we recommended selecting appropriate k-fold cross-validation methods to improve the prediction ability of the model.
What problem does this paper attempt to address?