UbiSitePred: A novel method for improving the accuracy of ubiquitination sites prediction by using LASSO to select the optimal Chou's pseudo components
Xiaowen Cui,Zhaomin Yu,Bin Yu,Minghui Wang,Baoguang Tian,Qin Ma
DOI: https://doi.org/10.1016/j.chemolab.2018.11.012
IF: 4.175
2019-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:Ubiquitination is an essential process in protein post-translational modification, which plays a crucial role in cell life activities, such as proteasomal degradation, transcriptional regulation, and DNA damage repair. Therefore, recognition of ubiquitination sites is a crucial step to understand the molecular mechanisms of ubiquitination. However, the experimental verification of numerous ubiquitination sites is time-consuming and costly. To alleviate these issues, a computational approach is needed to predict ubiquitination sites. This paper proposes a new method called UbiSitePred for predicting ubiquitination sites combined least absolute shrinkage and selection operator (LASSO) feature selection and support vector machine. First, we use binary encoding (BE), pseudo-amino acid composition (PseAAC), the composition of k-spaced amino acid pairs (CKSAAP), position-specific propensity matrices (PSPM) to extract the sequence feature information; thus, the initial feature space is obtained. Secondly, LASSO is applied to remove the feature redundancy information and selects the optimal feature subset. Finally, the optimal feature subset is input into the support vector machine (SVM) to predict the ubiquitination sites. Five-fold cross-validation shows that UbiSitePred model can achieve a better prediction performance compared with other methods, the AUC values for Set1, Set2, and Set3 are 0.9998, 0.8887, and 0.8481, respectively. Notably, the UbiSitePred has overall accuracy rates of 98.33%, 81.12%, and 76.90%, respectively. The results demonstrate that the proposed method is significantly superior to other state-of-the-art prediction methods and provide a new idea for the prediction of other post-translational modification sites of proteins. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/UbiSitePred/.