Prioritizing Disease Genes by Using Search Engine Algorithm
Min Li,Ruiqing Zheng,Qi Li,Jianxin Wang,Fang-Xiang Wu,Zhuohua Zhang
DOI: https://doi.org/10.2174/1574893611666160125220905
2016-01-01
Current Bioinformatics
Abstract:It is a fundamental challenge that identifying disease genes from a large number of candidates for a specific disease. As the biological experiment-based methods are generally time-consuming and laborious, it has become a new strategy to identify disease candidates by using computational approaches. In this paper, we proposed an algorithm based on the search engine ranking method, named PDGTR, to prioritize disease candidates. Firstly, we constructed a weighted human disease network by calculating the topological similarity and phenotype similarity of each pair of diseases. Then, we calculated the similarities of all the genes by using the protein-protein interaction network and the edge clustering coefficient. For a specific disease, a logistic regression model was used to generate the prior-knowledge of each gene. Finally, the search engine ranking based algorithm PDGTR was applied to prioritize the disease candidates. The proposed algorithm PDGTR was tested on five typical cancers: Breast Cancer, Colorectal Cancer, Hepatocellular carcinoma, Gastric Cancer and Osteoporosis, and compared with four state-of-the-art algorithms: RWR, DADA, PRINCE and PRP. The experimental results based on the leave-one-out cross validation, precision, ROC curve, and enrichment show that the proposed algorithm PDGTR outperforms RWR, DADA, PRINCE and PRP. Moreover, some potential disease genes were predicted by PDGTR and already mentioned by some literatures.