Deep Collaborative Filtering for Prediction of Disease Genes

Xiangxiang Zeng,Yinglai Lin,Yuying He,Linyuan Lu,Xiaoping Min,Alfonso Rodriguez-Paton
DOI: https://doi.org/10.1109/tcbb.2019.2907536
2019-01-01
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Abstract:Accurate prioritization of potential disease genes is a fundamental challenge in biomedical research. Various algorithms have been developed to solve such problems. Inductive Matrix Completion (IMC) is one of the most reliable models for its well-established framework and its superior performance in predicting gene-disease associations. However, the IMC method does not hierarchically extract deep features, which might limit the quality of recovery. In this case, the architecture of deep learning, which obtains high-level representations and handles noises and outliers presented in large-scale biological datasets, is introduced into the side information of genes in our Deep Collaborative Filtering (DCF) model. Further, for lack of negative examples, we also exploit Positive-Unlabeled (PU) learning formulation to low-rank matrix completion. Our approach achieves substantially improved performance over other state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database. Our approach is 10 percent more efficient than standard IMC in detecting a true association, and significantly outperforms other alternatives in terms of the precision-recall metric at the top-k predictions. Moreover, we also validate the disease with no previously known gene associations and newly reported OMIM associations. The experimental results show that DCF is still satisfactory for ranking novel disease phenotypes as well as mining unexplored relationships. The source code and the data are available at https://github.com/xzenglab/DCF .
What problem does this paper attempt to address?