Lysine Malonylation Identification in E-coli with Multiple Features

Yan Xu,Yingxi Yang,Hui Wang,Yuanhai Shao
DOI: https://doi.org/10.2174/1570164615666181005104614
2019-01-01
Current Proteomics
Abstract:Motivation: Lysine malonylation in eukaryote proteins had been found in 2011 through high-throughput proteomic analysis. However, it was poorly understood in prokaryotes. Recent researches have shown that maonylation in E. coli was significantly enriched in protein translation, energy metabolism pathways and fatty acid biosynthesis.Results:In this work we proposed a predictor to identify the lysine malonylation sites in E. coli through physicochemical properties, binary code and sequence frequency by support vector machine algorithm. The experimentally determined lysine malonylation sites were retrieved from the first and largest malonylome dataset in prokaryotes up to date. The physicochemical properties plus position specific amino acid sequence propensity features got the best results with AUC (the area under the Receive Operating Character curve) 0.7994, MCC (Mathew correlation coefficient) 0.4335 in 10-fold cross-validation. Meanwhile the AUC values were 0.7800, 0.7851 and 0.8050 in 6-fold, 8-fold and LOO (leave-one-out) cross-validation, respectively. All the ROC curves were close to each other which illustrated the robustness and performance of the proposed predictor. We also analyzed the sequence propensities through TwoSampleLogo and found some peptides differences with t-test p<0.01. The predictor had shown better results than those of other methods K-Nearest Neighbors, C4.5 decision tree, Naïve Bayes and Random Forest. Functional analysis showed that malonylated proteins were involved in many transcription activities and diverse biological processes. Meanwhile we also developed an online package which could be freely downloaded https://github.com/Sunmile/ Malonylation E.coli.
What problem does this paper attempt to address?