Combination Use of Protein–protein Interaction Network Topological Features Improves the Predictive Scores of Deleterious Non-Synonymous Single-Nucleotide Polymorphisms

Yiming Wu,Runyu Jing,Lin Jiang,Yanping Jiang,Qifan Kuang,Ling Ye,Lijun Yang,Yizhou Li,Menglong Li
DOI: https://doi.org/10.1007/s00726-014-1760-9
IF: 3.7891
2014-01-01
Amino Acids
Abstract:Single-nucleotide polymorphisms (SNPs) are the most frequent form of genetic variations. Non-synonymous SNPs (nsSNPs) occurring in coding region result in single amino acid substitutions that associate with human hereditary diseases. Plenty of approaches were designed for distinguishing deleterious from neutral nsSNPs based on sequence level information. Novel in this work, combinations of protein-protein interaction (PPI) network topological features were introduced in predicting disease-related nsSNPs. Based on a dataset that was compiled from Swiss-Prot, a random forest model was constructed with an average accuracy value of 80.43% and an MCC value of 0.60 in a rigorous tenfold crossvalidation test. For an independent dataset, our model achieved an accuracy of 88.05% and an MCC of 0.67. Compared with previous studies, our approach presented superior prediction ability. Results showed that the incorporated PPI network topological features outperform conventional features. Our further analysis indicated that disease-related proteins are topologically different from other proteins. This study suggested that nsSNPs may share some topological information of proteins and the change of topological attributes could provide clues in illustrating functional shift due to nsSNPs.
What problem does this paper attempt to address?