Abstract:Background: Prediction of protein structural classes (alpha, beta, alpha + beta and alpha/beta) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%. Results: We propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the chaos game representation is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using recurrence quantification analysis, K-string based information entropy and segment-based analysis. The resulting feature vectors are finally fed into a simple yet powerful Fisher's discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/. Conclusion: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of alpha helices and beta strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences.

Prediction of Protein Domain Folding Classes

Prediction of Protein (domain) Structural Classes Based on Amino-Acid Index.

The Prediction of the Structural Class of Protein: Application of the Measure of Diversity

DomBpred: Protein Domain Boundary Prediction Based on Domain-Residue Clustering Using Inter-Residue Distance.

How Good is Prediction of Protein Structural Class by the Component-Coupled Method?

Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

High-accuracy Prediction of Protein Structural Classes Using PseAA Structural Properties and Secondary Structural Patterns

Prediction and Classification of Domain Structural Classes

Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier.

Improved Method for Predicting Protein Fold Patterns with Ensemble Classifiers.

Prediction of Protein Structural Classes Based on Correlations of Amino Acid Residues

Prediction of protein structural class using novel evolutionary collocation-based sequence representation.

Prediction of Protein Structural Classes Based on Feature Selection Technique.

Prediction of the Secondary Structure Contents of Globular Proteins Based on Three Structural Classes

Chapter 16 fractal related methods for predicting protein structure classes and functions

Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components.

Prediction of the secondary structure content of globular proteins based on structural classes

Predicting Protein Folding Types by Distance Functions That Make Allowances for Amino Acid Interactions.

Predicting protein fold pattern with functional domain and sequential evolution information.

Ensemble Classifier for Protein Fold Pattern Recognition

Unsupervised domain classification of AlphaFold2-predicted protein structures