Prediction of Cell-Penetrating Peptides Using both Support Vector Machine and Linear Discriminant Analysis

Chen Guohua,Xia Zhining,Lu Yao
2010-01-01
Acta Chimica Sinica
Abstract:In order to identify new potential CPPs, two methods, fisher's linear discriminant analysis (LDA) and support vector machine (SVM), have used to construct two classifiers. We have identified 123 known natural CPPs from the literature and used them to construct 2 data sets, the training set with 25 CPPs and 16 non-CPPs and the test set with 61 CPPs and 21 non-CPPs. The auto cross covariances (ACCs) by describing each amino acid by principal properties (z-scales) and their main compounds were used to construct classifiers, respectively. The obtained models, using fisher's LDA, were only able to classify correctly 57.3% on test sets, whereas these models showed large classification rates on the training sets in training and cross-validation procedures. The classification rates using SVM tool were 100% (75.6%) and 85.4% (80.5%) on the training test in training (Loo-cross-validation), when 72 ACCs and their main components were used for classification. The best result for SVM classification on test set is 74.4% using 72 ACCs. These results validate that the SVM can extract the minor change in variables. The SVM's model is better than LDA model.
What problem does this paper attempt to address?