Abstract:BACKGROUND:Genome-wide association studies (GWAS) have successfully identified genetic susceptible variants for complex diseases. However, the underlying mechanism of such association remains largely unknown. Most disease-associated genetic variants have been shown to reside in noncoding regions, leading to the hypothesis that regulation of gene expression may be the primary biological mechanism. Current methods to characterize gene expression mediating the effect of genetic variant on diseases, often analyzed one gene at a time and ignored the network structure. The impact of genetic variant can propagate to other genes along the links in the network, then to the final disease. There could be multiple pathways from the genetic variant to the final disease, with each having the chain structure since the first node is one specific SNP (Single Nucleotide Polymorphism) variant and the end is disease outcome. One key but inadequately addressed question is how to measure the between-node connection strength and rank the effects of such chain-type pathways, which can provide statistical evidence to give the priority of some pathways for potential drug development in a cost-effective manner.RESULTS:We first introduce the maximal correlation coefficient (MCC) to represent the between-node connection, and then integrate MCC with K shortest paths algorithm to rank and identify the potential pathways from genetic variant to disease. The pathway importance score (PIS) was further provided to quantify the importance of each pathway. We termed this method as "MCC-SP". Various simulations are conducted to illustrate MCC is a better measurement of the between-node connection strength than other quantities including Pearson correlation, Spearman correlation, distance correlation, mutual information, and maximal information coefficient. Finally, we applied MCC-SP to analyze one real dataset from the Religious Orders Study and the Memory and Aging Project, and successfully detected 2 typical pathways from APOE genotype to Alzheimer's disease (AD) through gene expression enriched in Alzheimer's disease pathway.CONCLUSIONS:MCC-SP has powerful and robust performance in identifying the pathway(s) from the genetic variant to the disease. The source code of MCC-SP is freely available at GitHub ( https://github.com/zhuyuchen95/ADnet ).

NRSSPrioritize: Associating Protein Complex and Disease Similarity Information to Prioritize Disease Candidate Genes

HybridRanker: Integrating network structure and disease knowledge to prioritize cancer candidate genes

Towards Prediction and Prioritization of Disease Genes by the Modularity of Human Phenome-Genome Assembled Network.

A Robust Phenotype-Driven Likelihood Ratio Analysis Approach Assisting Interpretable Clinical Diagnosis of Rare Diseases.

MCC-SP: a Powerful Integration Method for Identification of Causal Pathways from Genetic Variants to Complex Disease.

Prioritization of orphan disease-causing genes using topological feature and GO similarity between proteins in interaction networks

Integrating Multiple Protein-Protein Interaction Networks to Prioritize Disease Genes: a Bayesian Regression Approach

Walking on Multiple Disease-Gene Networks to Prioritize Candidate Genes.

Comparative Study of Network-Based Prioritization of Protein Domains Associated with Human Complex Diseases

Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes

GenePANDA—a Novel Network-Based Gene Prioritizing Tool for Complex Diseases

A Meta-Analysis Strategy for Gene Prioritization Using Gene Expression, SNP Genotype, and eQTL Data

TransNeT-CGP: A cluster-based comorbid gene prioritization by integrating transcriptomics and network-topological features

Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity

Sequence-Based Prioritization of Nonsynonymous Single-Nucleotide Polymorphisms for the Study of Disease Mutations

Prioritization of Candidate Nonsynonymous Single Nucleotide Polymorphisms via Sequence Conservation Features

Disease gene prioritization using network topological analysis from a sequence based human functional linkage network

A Comprehensive Evaluation of Disease Phenotype Networks for Gene Prioritization

Ranking Cancer Proteins by Integrating PPI Network and Protein Expression Profiles

Identification of Disease-Related Nssnps Via the Integration of Protein Sequence Features and Domain-Domain Interaction Data.

Extraction of Sequence Conservation Features for the Prioritization of Candidate Single Amino Acid Polymorphisms