Evaluation of Deep Learning for predicting rice traits using structural and single-nucleotide genomic variants

Ioanna-Theoni Vourlaki,Sebastián E. Ramos-Onsins,Miguel Pérez-Enciso,Raúl Castanera
DOI: https://doi.org/10.1101/2024.01.18.576088
2024-01-22
Abstract:Structural variants (SVs) such as deletions, inversions, duplications, and Transposable Element (TE) Insertion Polymorphisms (TIPs) are prevalent in plant genomes and have played an important role in evolution and domestication, as they constitute a significant source of genomic and phenotypic variability. Nevertheless, most methods in quantitative genetics focusing on crop improvement, such as genomic prediction, consider Single Nucleotide Polymorphisms (SNPs) as the only type of genetic marker. Here, we used rice to investigate whether combining the structural and nucleotide genome-wide variation can improve prediction ability of traits when compared to using only SNPs. Moreover, we also examine the potential advantage of Deep Learning (DL) networks over Bayesian Linear models, which have been widely applied in genomic prediction. Specifically, the performance of BayesC and a Bayesian Reproducible Kernel Hilbert space regressions were compared to two different DL architectures, the Multilayer Perceptron, and the Convolution Neural Network. We further explore their prediction ability by using various marker input strategies and found that exploiting structural and nucleotide variation improves prediction ability on complex traits in rice. Also, DL models outperformed Bayesian models in 75% of the studied cases. Finally, DL systematically improved prediction ability of binary traits against the Bayesian models.
Genomics
What problem does this paper attempt to address?
The problem addressed in this paper is how to improve the accuracy of predicting traits in rice (Oryza sativa) by utilizing deep learning (DL) combined with structural variants (SVs) and single nucleotide polymorphisms (SNPs). The authors compared DL networks, such as multilayer perceptron and convolutional neural network, with traditional Bayesian linear models (e.g. BayesC and Bayesian repeatability Hilbert space regression), to evaluate the performance of different methods in predicting complex and binary traits. The paper also focuses on whether the structural and nucleotide variations can enhance prediction capability, as well as the advantages of DL models over linear models. Furthermore, the study explores different marker input strategies, including selecting the most relevant markers, linked SNPs, and principal component analysis, to optimize prediction performance.