Using Correlation Analysis and Nonnegative Matrix Factorization to Predict Protein Structural Classes via Position-Specific Scoring Matrix

Yunyun Liang,Sanyang Liu,Shengli Zhang
2016-01-01
Abstract:Prediction of protein structural classes plays an important role in protein science, such as protein function prediction, protein fold recognition and protein folding rate analysis. Currently, prediction based solely on the position-specific scoring matrix(PSSM) has played a key role in improving the prediction accuracy. Feature extraction and feature selection are two critical steps for the prediction quality. In this paper, we propose a novel method using correlation analysis on the PSSM. Then a 3600-dimensional(3600D) feature vector is constructed and the dimension is decreased to 200D by using nonnegative matrix factorization (NMF). To evaluate the proposed method, objective jackknife cross-validation tests are performed on two widely used low-similarity datasets: 1189 and 25PDB. Our method achieves the favorable performance on prediction accuracies and also outperforms the other listed PSSM-based methods. The result shows that our approach will offer a reliable tool for prediction of protein structural classes, especially for low-similarity sequences.
What problem does this paper attempt to address?