Abstract:Following the success of human genome project, the gap between sharply increasing the number of protein sequences entering into data bank and slow accumulation of know structure is becoming large. Developing a fast and accurate method to predict the protein properties based on the primary sequences becomes indispensable. In general, the performance of the predictive system can be improved by selecting appropriate algorithm and the fitting method of extracting feature. Thus a new method of extracting feature (the weighting pseudo–amino acid composition) from the sequences has been introduced to predict the protein homo–oligomers, which is a combination of a set of weighting discrete sequence correlation factors computed with the amino acid index profile and the 20 components of the conventional amino acid composition. We extract four attribute parameter datasets (COMP, PLIV, FAUJ and MAXF) from the primary sequences as examples to investigate this problem. The COMP attribute dataset is composed of amino acid composition, and the PLIV, FAUJ and MAXF attribute datasets are composed of the amino acid composition and a set of weighting discrete sequence correlation factors of corresponding amino acid residue index. The total accuracies of PLIV, FAUJ and MAXF using support vector machines (SVM) algorithm are 80.36%, 79.34% and 79.02% respectively in 10 fold cross–validation (10CV) test, which are 4.59%, 3.57% and 3.25% respectively higher than that of COMP. Based on the same COMP and PLIV attribute datasets, the total accuracies of SVM are 33.87% and 18.05% respectively higher than that of covariant discriminant algorithm in the jackknife test. These results show that the method of extracting feature from the protein sequences is effective and feasible for predicting homo–oligomers, and implies that the primary sequences of homo–oligomeric proteins contain quaternary structure information, and also indicates that the performance of SVM is superior to the covariant discriminant algorithm for classifying protein homo–oligomers.

A study on predicting the cofactors of oxidoreductases based on different se-quence features

Predicting the Cofactors of Oxidoreductases Based on Amino Acid Composition Distribution and Chou's Amphiphilic Pseudo-Amino Acid Composition

[Predicting the Cofactors of Oxidoreductases by the Modified Pseudo-Amino Acid Composition].

Predicting protein oxidation sites with feature selection and analysis approach.

Accurate Prediction and Key Protein Sequence Feature Identification of Cyclins

Predicting Protein Structural Classes for Low-Similarity Sequences by Evaluating Different Features

Prediction of enzyme subfamily class via pseudo amino acid composition by incorporating the conjoint triad feature.

Using Chou's Amphiphilic Pseudo-Amino Acid Composition and Support Vector Machine for Prediction of Enzyme Subfamily Classes.

Prediction of Protein Homo-Oligomer Types by Pseudo Amino Acid Composition: Approached with an Improved Feature Extraction and Naive Bayes Feature Fusion.

Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features

Support Vector Machines for Predicting Protein Homo- Oligomers by Incorporating Pseudo-Amino Acid Composition #

Prediction of Thermophilic Protein with Pseudo Amino Acid Composition: an Approach from Combined Feature Selection and Reduction.

Improving the Classification of Nuclear Receptors with Feature Selection.

Using Pseudo Amino Acid Composition to Predict Hydrolase Subfamily

Identify Protein 8-Class Secondary Structure with Quadratic Discriminant Algorithm Based on the Feature Combination

AOPM: Application of Antioxidant Protein Classification Model in Predicting the Composition of Antioxidant Drugs

Identification of Disease-Related 2-Oxoglutarate/Fe (II)-Dependent Oxygenase Based on Reduced Amino Acid Cluster Strategy

Prediction of Protein Secondary Structure Using Feature Selection and Analysis Approach

A Study of Prediction Methods for Protein Subcellular Localization

A New Method For Recognizing Cytokines Based On Feature Combination And A Support Vector Machine Classifier

Prediction of Interactiveness of Proteins and Nucleic Acids Based on Feature Selections.