A New Hybrid Approach to Predict Subcellular Localization by Incorporating Protein Evolutionary Conservation Information

ShaoWu Zhang,YunLong Zhang,JunHui Li,HuiFeng Yang,YongMei Cheng,GuoPing Zhou
DOI: https://doi.org/10.1007/978-3-540-74771-0_20
2007-01-01
Abstract:The rapidly increasing number of sequence entering into the genome databank has created the need for fully automated methods to analyze them. Knowing the cellular location of a protein is a key step towards understanding its function. The development in statistical prediction of protein attributes generally consists of two cores: one is to construct a training dataset and the other is to formulate a predictive algorithm. The latter can be further separated into two subcores: one is how to give a mathematical expression to effectively represent a protein and the other is how to find a powerful algorithm to accurately perform the prediction. Here, an improved evolutionary conservation algorithm was proposed to calculate per residue conservation score. Then, each protein can be represented as a feature vector created with multi-scale energy (MSE). In addition, the protein can be represented as other feature vectors based on amino acid composition (AAC), weighted auto-correlation function and Moment descriptor methods. Finally, a novel hybrid approach was developed by fusing the four kinds of feature classifiers through a product rule system to predict 12 subcellular locations. Compared with existing methods, this new approach provides better predictive performance. High success accuracies were obtained in both jackknife cross-validation test and independent dataset test, suggesting that introducing protein evolutionary information and the concept of fusing multifeatures classifiers are quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.
What problem does this paper attempt to address?