Prediction of kinase-specific phosphorylational interactions using random forest
Wen Liu,Yanzhi Guo,Jiesi Luo,Yun Zhong,Xiaojiao Yang,Xuemei Pu,Menglong Li
DOI: https://doi.org/10.1016/j.chemolab.2013.05.005
IF: 4.175
2013-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:Protein kinases mediate many important cellular and molecular processes by phosphorylating specific substrates. Therefore, it is of great significance to properly detect the functions of these kinases and their down-stream substrates. However, only 35% of human kinases have known substrates, but protein–protein interaction (PPI) data are available for 85% of them. So it is probable that we can reveal potential kinase–substrate pairs using these PPIs. In this paper, we compiled five unbiased interaction datasets of four major serine/threonine (S/T) protein kinase families—CDK, CK2, PKA, PKC and an important tyrosine (Y) kinase family—SRC. Based on these datasets, prediction models for identifying kinase-specific interactions were developed by random forest (RF). Seven physicochemical properties of amino acids and auto covariance (AC) transform were used for numerical representation of protein sequences. Then permutation importance analysis was used for feature optimization and 102 optimal features were selected from all the 420 variables. The model gives an area under the receiver operating curve (AUC) values of 0.8785 in classifying phosphorylational and non-phosphorylational interactions of kinases and their substrates. For predicting kinase family-specific phosphorylational interactions, when performing on the independent datasets, the method yields the prediction accuracy of over 90% for CDK, CK2, PKA, PKC and SRC. Finally tolerance test and shuffling experiment were done to further verify the reliability of the models. All results demonstrate a useful method for identifying new PPIs of different kinase families.