Abstract:Abstract Gao et al. propose several approaches to incorporate gene annotation into genomic prediction and evaluate these new methods on populations..... Today, genomic prediction (GP) is an established technology in plant and animal breeding programs. Current standard methods are purely based on statistical considerations but do not make use of the abundant biological knowledge, which is easily available from public databases. Major questions that have to be answered before biological prior information can be used routinely in GP approaches are which types of information can be used, and at which points they can be incorporated into prediction methods. In this study, we propose a novel strategy to incorporate gene annotation into GP of complex phenotypes by defining haploblocks according to gene positions. Haplotype effects are then modeled as categorical or as numerical allele dosage variables. The underlying concept of this approach is to build the statistical model on variables representing the biologically functional units. We evaluate the new methods with data from a heterogeneous stock mouse population, the Drosophila Genetic Reference Panel (DGRP), and a rice breeding population from the Rice Diversity Panel. Our results show that using gene annotation to define haploblocks often leads to a comparable, but for some traits to a higher, predictive ability compared to SNP-based models or to haplotype models that do not use gene annotation information. Modeling gene interaction effects can further improve predictive ability. We also illustrate that the additional use of markers that have not been mapped to any gene in a second separate relatedness matrix does in many cases not lead to a relevant additional increase in predictive ability when the first matrix is based on haploblocks defined with gene annotation data, suggesting that intergenic markers only provide redundant information on the considered data sets. Therefore, gene annotation information seems to be appropriate to perceive the importance of DNA segments. Finally, we discuss the effects of gene annotation quality, marker density, and linkage disequilibrium on the performance of the new methods. To our knowledge, this is the first work that incorporates epistatic interaction or gene annotation into haplotype-based prediction approaches.

GC $$^2$$ 2 NMF: A Novel Matrix Factorization Framework for Gene–Phenotype Association Prediction

GC ^2 NMF: A Novel Matrix Factorization Framework for Gene–Phenotype Association Prediction

Weighted deep factorizing heterogeneous molecular network for genome-phenome association prediction

GCN-MF: Disease-Gene Association Identification By Graph Convolutional Networks and Matrix Factorization

Mining Functional Modules by Multiview-NMF of Phenome-Genome Association

Genome-Phenome Association Prediction by Deep Factorizing Heterogeneous Molecular Network

NMFGO: Gene Function Prediction Via Nonnegative Matrix Factorization with Gene Ontology.

Novel Collaborative Weighted Non-negative Matrix Factorization Improves Prediction of Disease-Associated Human Microbes

Flexible Non-Negative Matrix Factorization to Unravel Disease-Related Genes

Gauss-Seidel Based Non-Negative Matrix Factorization for Gene Expression Clustering

An NMF-L2,1-Norm Constraint Method for Characteristic Gene Selection

Incorporating Gene Annotation into Genomic Prediction of Complex Phenotypes

Joint Nonnegative Matrix Factorization Based on Sparse and Graph Laplacian Regularization for Clustering and Co-Differential Expression Genes Analysis.

Genomic Prediction of Complex Phenotypes Using Genic Similarity Based Relatedness Matrix

Graph Convolutional Network with Neural Inductive Matrix Completion for Predicting Disease-Related LncRNA Genes

A Robust Manifold Graph Regularized Nonnegative Matrix Factorization Algorithm for Cancer Gene Clustering

WGMFDDA: A Novel Weighted-Based Graph Regularized Matrix Factorization for Predicting Drug-Disease Associations.

Deep Collaborative Filtering for Prediction of Disease Genes

Graph Convolutional Neural Network with Multi-Layer Attention Mechanism for Predicting Potential Microbe-Disease Associations

Graph regularized L2,1-nonnegative matrix factorization for miRNA-disease association prediction

Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype-phenotype interactions