Identification of Protein Lysine Crotonylation Sites by a Deep Learning Framework with Convolutional Neural Networks

Yiming Zhao,Ningning He,Zhen Chen,Lei Li
DOI: https://doi.org/10.1109/access.2020.2966592
IF: 3.9
2020-01-01
IEEE Access
Abstract:Protein lysine crotonylation (Kcr) is an important type of post-translational modification that regulates various activities. The experimental approaches to identify the Kcr sites are time-consuming and it is necessary to develop computational prediction approaches. Previously, a few classifiers were based on over 100 Kcr sites from histone proteins. Recently, thousands of Kcr sites have been experimentally verified on non-histone proteins from the plant species Papaya. We found that the previous classifiers fail to identify non-histone Kcr sites. Therefore, it is necessary to develop classifiers for non-histone proteins. Accordingly, we constructed 11 different classifiers to recognize non-histone Kcr sites by combining different features and algorithms (such as random forest and convolutional neural network (CNN)). They were compared using both ten-fold cross validation and independent test dataset. The classifier based on CNN and the word embedding approach, dubbed as pKcr, performed better than other classifiers. pKcr obtained AUC value of 0.855 and 0.853 for ten-fold cross-validation and independent data test, respectively. No statistical difference of its performances on these two tests indicates that pKcr does not overfit. In the pKcr framework, a peptide is cleaved into biological characters followed by transformation into digital vectors. These vectors are input into the CNN with participation of multiple convolution kernels to automatically extract various features and pooling layers to perform feature selection. The superior performance of pKcr suggests that this algorithm is well suited for the Kcr prediction and may be applied broadly to predicting other types of PTM sites. pKcr can be available at http://www.bioinfogo.org/pkcr.
What problem does this paper attempt to address?