Abstract:Background: Prediction of protein structural classes (alpha, beta, alpha + beta and alpha/beta) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%. Results: We propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the chaos game representation is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using recurrence quantification analysis, K-string based information entropy and segment-based analysis. The resulting feature vectors are finally fed into a simple yet powerful Fisher's discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/. Conclusion: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of alpha helices and beta strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences.

The prediction of protein structural class using averaged chemical shifts

Prediction of Functional Class of Proteins and Peptides Irrespective of Sequence Homology by Support Vector Machines.

Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

How Good is Prediction of Protein Structural Class by the Component-Coupled Method?

Folding rate prediction based on neural network model

Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM

Prediction of Protein Structural Class for Low-Similarity Sequences Using Chou's Pseudo Amino Acid Composition and Wavelet Denoising.

PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations

Improving protein structural class prediction using novel combined sequence information and predicted secondary structural features

Using LogitBoost classifier to predict protein structural classes.

Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor

EFG-CS: Predicting chemical shifts from amino acid sequences with protein structure prediction using machine learning and deep learning models

Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach

Secondary structure-based assignment of the protein structural classes

Accurate Prediction of Chemical Shifts for Aqueous Protein Structure for "Real World" Cases using Machine Learning

Prediction of Protein 13cα NMR Chemical Shifts Using a Combination Scheme of Statistical Modeling and Quantum-Mechanical Analysis

Protein structural class prediction using physiochemical property based grouped weighted encoding index

Amino acid torsion angles enable prediction of protein fold classification

Improving prediction accuracy for protein structure classification by neural network using feature combination

Efficient and Interpretable Prediction of Protein Functional Classes by Correspondence Analysis and Compact Set Relations.

Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes