Prioritization of candidate disease genes by combining topological similarity and semantic similarity

Bin Liu,Min Jin,Pan Zeng
DOI: https://doi.org/10.1016/j.jbi.2015.07.005
IF: 8
2015-01-01
Journal of Biomedical Informatics
Abstract:Display Omitted We use Hybrid Relative Specificity Similarity algorithm to measure semantic similarity.We combine semantic similarity information and a random walk based algorithm on the PPI network to predict causal genes.The algorithm is robust, no matter whether different test sets or different parameter values are used. The identification of gene-phenotype relationships is very important for the treatment of human diseases. Studies have shown that genes causing the same or similar phenotypes tend to interact with each other in a protein-protein interaction (PPI) network. Thus, many identification methods based on the PPI network model have achieved good results. However, in the PPI network, some interactions between the proteins encoded by candidate gene and the proteins encoded by known disease genes are very weak. Therefore, some studies have combined the PPI network with other genomic information and reported good predictive performances. However, we believe that the results could be further improved. In this paper, we propose a new method that uses the semantic similarity between the candidate gene and known disease genes to set the initial probability vector of a random walk with a restart algorithm in a human PPI network. The effectiveness of our method was demonstrated by leave-one-out cross-validation, and the experimental results indicated that our method outperformed other methods. Additionally, our method can predict new causative genes of multifactor diseases, including Parkinson's disease, breast cancer and obesity. The top predictions were good and consistent with the findings in the literature, which further illustrates the effectiveness of our method.
What problem does this paper attempt to address?