A Double-Svm Classification System for Single and Multiple-Subcellular Localizations of Yeast Proteins Using Sequence Motifs

Su Zhang,Wei Yang,Hongtao Lu,Zhizhou Zhang
DOI: https://doi.org/10.1109/icia.2007.4295720
2007-01-01
Abstract:The cellular localization site and the potential functionality of a protein are closely related. In this paper, we develop a novel Double-SVM Classification System for predicting the subcellular localization sites of the proteins. First, a set of features are made from the occurrence frequency of sequence motifs. Then discriminant features are selected by I-RELIEF and used as the inputs of the support vector machine (SVM) for classification. The two classes are single and multiple-subcellular localizations. Due to the large size difference among the protein sequences, we set two SVMs, one for the shorter sequences and the other for the longer ones. This system is applied to predict the subcellular localization sites of Yeast proteins. The experimental result shows that the testing accuracy of the system is 66%, which is higher than that of the traditional single-SVM model.
What problem does this paper attempt to address?