Predicting Subchloroplast Locations of Proteins Based on the General Form of Chou'S Pseudo Amino Acid Composition: Approached from Optimal Tripeptide Composition

Hao Lin,Chen Ding,Lu-Feng Yuan,Wei Chen,Hui Ding,Zi-Qiang Li,Feng-Biao Guo,Jian Huang,Ni-Ni Rao
DOI: https://doi.org/10.1142/s1793524513500034
2013-01-01
International Journal of Biomathematics
Abstract:Chloroplasts are organelles found in plant cells that conduct photosynthesis. The subchloroplast locations of proteins are correlated with their functions. With the availability of a great number of protein data, it is highly desired to develop a computational method to predict the subchloroplast locations of chloroplast proteins. In this study, we proposed a novel method to predict subchloroplast locations of proteins using tripeptide compositions. It first used the binomial distribution to optimize the feature sets. Then the support vector machine was selected to perform the prediction of subchloroplast locations of proteins. The proposed method was tested on a reliable and rigorous dataset including 259 chloroplast proteins with sequence identity ≤ 25%. In the jack-knife cross-validation, 92.21% envelope proteins, 93.20% thylakoid membrane, 52.63% thylakoid lumen and 85.00% stroma can be correctly identified. The overall accuracy achieves 88.03% which is higher than that of other models. Based on this method, a predictor called ChloPred has been built and can be freely available from http://cobi.uestc.edu.cn/people/hlin/tools/ChloPred/ . The predictor will provide important information for theoretical and experimental research of chloroplast proteins.
What problem does this paper attempt to address?