Abstract:Background The identification of genes responsible for human inherited diseases is one of the most challenging tasks in human genetics. Recent studies based on phenotype similarity and gene proximity have demonstrated great success in prioritizing candidate genes for human diseases. However, most of these methods rely on a single protein-protein interaction (PPI) network to calculate similarities between genes, and thus greatly restrict the scope of application of such methods. Meanwhile, independently constructed and maintained PPI networks are usually quite diverse in coverage and quality, making the selection of a suitable PPI network inevitable but difficult. Methods We adopt a linear model to explain similarities between disease phenotypes using gene proximities that are quantified by diffusion kernels of one or more PPI networks. We solve this model via a Bayesian approach, and we derive an analytic form for Bayes factor that naturally measures the strength of association between a query disease and a candidate gene and thus can be used as a score to prioritize candidate genes. This method is intrinsically capable of integrating multiple PPI networks. Results We show that gene proximities calculated from PPI networks imply phenotype similarities. We demonstrate the effectiveness of the Bayesian regression approach on five PPI networks via large scale leave-one-out cross-validation experiments and summarize the results in terms of the mean rank ratio of known disease genes and the area under the receiver operating characteristic curve (AUC). We further show the capability of our approach in integrating multiple PPI networks. Conclusions The Bayesian regression approach can achieve much higher performance than the existing CIPHER approach and the ordinary linear regression method. The integration of multiple PPI networks can greatly improve the scope of application of the proposed method in the inference of disease genes.

Computational Approaches for Prioritizing Candidate Disease Genes Based on PPI Networks

Integrating Multiple Protein-Protein Interaction Networks to Prioritize Disease Genes: a Bayesian Regression Approach

Towards Prediction and Prioritization of Disease Genes by the Modularity of Human Phenome-Genome Assembled Network.

Towards Identification of Human Disease Phenotype-Genotype Association via a Network-Module Based Method

Constructing Human Phenome-Interactome Networks for the Prioritization of Candidate Genes

Walking on Multiple Disease-Gene Networks to Prioritize Candidate Genes.

NRSSPrioritize: Associating Protein Complex and Disease Similarity Information to Prioritize Disease Candidate Genes

A Comprehensive Evaluation of Disease Phenotype Networks for Gene Prioritization

Enhancing Cancer Driver Gene Prediction by Protein-Protein Interaction Network

Prioritization of orphan disease-causing genes using topological feature and GO similarity between proteins in interaction networks

GenePANDA—a Novel Network-Based Gene Prioritizing Tool for Complex Diseases

Uncover Disease Genes by Maximizing Information Flow in the Phenome-Interactome Network.

PGCN: Disease gene prioritization by disease and gene embedding through graph convolutional neural networks

Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes

Recent Advances in Network-based Methods for Disease Gene Prediction

Comparative Study of Network-Based Prioritization of Protein Domains Associated with Human Complex Diseases

Prioritizing Disease-Related Microbes Based on the Topological Properties of a Comprehensive Network

Predicting disease-related genes by path-based similarity and community structure in protein-protein interaction network

Integration of Protein-Protein Interaction Networks and Gene Expression Profiles Helps Detect Pancreatic Adenocarcinoma Candidate Genes

A network-based machine-learning framework to identify both functional modules and disease genes

Identifying network biomarkers based on protein-protein interactions and expression data