Using an Ensemble of Support Vector Machine Classifiers to Predict Protein Supersecondary Structural Motifs.

Dongsheng Zou,Zhongshi He,Yuan Yan
DOI: https://doi.org/10.4304/jcp.6.10.2053-2059
2011-01-01
Journal of Computers
Abstract:The success of human genome project and the rapid increase in the number of protein sequences entering into data bank have stimulated a challenging frontier: how to develop a fast and accurate method to predict the supersecondary structural motifs of protein. It could help to reduce the ever-widening gap between known sequences and unknown structure. To address this problem, a new method for prediction of protein supersecondary structural motifs is proposed in this paper. This method combines amino acid basic compositions with dipeptide components for feature representation of protein sequential patterns. An ensemble classifier based on Support vector machines is used to predict four kinds of supersecondary structural motifs in protein sequences. Total twenty-four increments of diversity are defined for each supersecondary structural motif. The method is trained and tested on ArchDB40 dataset containing 3088 proteins. The highest overall accuracy for the training dataset and the independent testing dataset are 74.8% and 69.3% respectively.
What problem does this paper attempt to address?