Abstract:Background: Prediction of protein structural classes (alpha, beta, alpha + beta and alpha/beta) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%. Results: We propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the chaos game representation is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using recurrence quantification analysis, K-string based information entropy and segment-based analysis. The resulting feature vectors are finally fed into a simple yet powerful Fisher's discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/. Conclusion: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of alpha helices and beta strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences.

A Data Mining Approach to Predict Protein Secondary Structure

Predicting Protein Secondary Structure by a Support Vector Machine Based on a New Coding Scheme.

Weave Amino Acid Sequences for Protein Secondary Structure Prediction

Prediction of Protein Secondary Structure Content by Using the Concept of Chou'S Pseudo Amino Acid Composition and Support Vector Machine

Improved Protein Secondary Structure Prediction Using a Intelligent HSVM Method with a New Encoding Scheme

Prediction of Protein Secondary Structure Content Using Support Vector Machine

Protein Secondary Structure Prediction Based On Statistical Dictionaries

A Protein Secondary Structure Prediction Framework Based on the Support Vector Machine

A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach.

Multiple Linear Regression for Protein Secondary Structure Prediction.

An Novel Method of Protein Secondary Structure Prediction Based on Compound Pyramid Model.

HYBP_PSSP: a Hybrid Back Propagation Method for Predicting Protein Secondary Structure.

An approach of protein secondary structure prediction based on SVM method in compound pyramid model

Prediction of Protein Secondary Structure Using Feature Selection and Analysis Approach

Using an Ensemble of Support Vector Machine Classifiers to Predict Protein Supersecondary Structural Motifs.

Predicting protein second structure using a novel hybrid method

Protein Secondary Structure Prediction Using Support Vector Machine with a PSSM Profile and an Advanced Tertiary Classifier

Predicting protein secondary structure using a mixed-modal SVM method in a compound pyramid model

A Seqlet-Based Maximum Entropy Markov Approach for Protein Secondary Structure Prediction

A Step-by-step Classification Algorithm of Protein Secondary Structures Based on Double-Layer SVM Model.

Prediction of protein structural classes for low-homology sequences based on predicted secondary structure