GC $$^2$$ 2 NMF: A Novel Matrix Factorization Framework for Gene–Phenotype Association Prediction

Yaogong Zhang,Jiahui Liu,Xiaohu Liu,Yuxiang Hong,Xin Fan,Yalou Huang,Yuan Wang,Maoqiang Xie
DOI: https://doi.org/10.1007/s12539-018-0296-1
2018-01-01
Interdisciplinary Sciences Computational Life Sciences
Abstract:Gene–phenotype association prediction can be applied to reveal the inherited basis of human diseases and facilitate drug development. Gene–phenotype associations are related to complex biological processes and influenced by various factors, such as relationship between phenotypes and that among genes. While due to sparseness of curated gene–phenotype associations and lack of integrated analysis of the joint effect of multiple factors, existing applications are limited to prediction accuracy and potential gene–phenotype association detection. In this paper, we propose a novel method by exploiting weighted graph constraint learned from hierarchical structures of phenotype data and group prior information among genes by inheriting advantages of Non-negative Matrix Factorization (NMF), called Weighted Graph Constraint and Group Centric Non-negative Matrix Factorization (GC\(^2\)NMF). Specifically, first we introduce the depth of parent–child relationships between two adjacent phenotypes in hierarchical phenotypic data as weighted graph constraint for a better phenotype understanding. Second, we utilize intra-group correlation among genes in a gene group as group constraint for gene understanding. Such information provides us with the intuition that genes in a group probably result in similar phenotypes. The model not only allows us to achieve a high-grade prediction performance, but also helps us to learn interpretable representation of genes and phenotypes simultaneously to facilitate future biological analysis. Experimental results on biological gene–phenotype association datasets of mouse and human demonstrate that GC\(^2\)NMF can obtain superior prediction accuracy and good understandability for biological explanation over other state-of-the-arts methods.
What problem does this paper attempt to address?