PREDICTION OF PROTEIN SUBCELLULAR LOCALIZATION USING A NOVEL FEATURE EXTRACTION METHOD:SEQUENCE-SEGMENTED PSEUDO AMINO ACID COMPOSITION

杨会芳,程咏梅,张绍武,潘泉
DOI: https://doi.org/10.3321/j.issn:1000-6737.2008.03.010
2008-01-01
ACTA BIOPHYSICA SINICA
Abstract:Knowing the protein subcellular localizations is important because it can provide useful insights about the protein functions,as well as how and in what kind of cellular environments the proteins interact with each other and with other molecules.A novel feature extraction method:sequence-segmented pseudo amino acid composition(PseAAC) has been developed to predict protein subcellular localizations for the two databases(C2129,CS2423) which were first constructed by Chou and Shen.The authors took support vector machines as classifier,and used the parameters of overall accuracy Q3,content-balance accuracy index Q9 etc to evaluate the performance of prediction system.The results show that performance of the sequence-segmented PseAAC method is better than that of the PseAAC which extracts feature factor sets from full sequence.For example,the Q3 and Q9 of sequence-segmented moment descriptors PseAAC for database C2129 are 84.7%,60.8% respectively,which are 1.8 and 2.2 percentage points higher than that of moment descriptors PseAAC,and the Q3 of the sequence-segmented moment descriptors PseAAC is also 9.1 percentage points higher than Xiao’s method.The feature vector sets extracted with the sequence-segmented PseAAC method not only contain the order information between the residues,but also contain the coupled information among the sub-sequences,and the sub-sequences maybe has correlation with the protein functional domains.The method of the sequence-segmented PseAAC is an effective method for predicting protein subcellular localizations.
What problem does this paper attempt to address?