Abstract:Abstract Protein fold recognition is a critical step toward protein structure and function prediction, aiming at providing the most likely fold type of the query protein. In recent years, the development of deep learning (DL) technique has led to massive advances in this important field, and accordingly, the sensitivity of protein fold recognition has been dramatically improved. Most DL-based methods take an intermediate bottleneck layer as the feature representation of proteins with new fold types. However, this strategy is indirect, inefficient and conditional on the hypothesis that the bottleneck layer’s representation is assumed as a good representation of proteins with new fold types. To address the above problem, in this work, we develop a new computational framework by combining triplet network and ensemble DL. We first train a DL-based model, termed FoldNet, which employs triplet loss to train the deep convolutional network. FoldNet directly optimizes the protein fold embedding itself, making the proteins with the same fold types be closer to each other than those with different fold types in the new protein embedding space. Subsequently, using the trained FoldNet, we implement a new residue–residue contact-assisted predictor, termed FoldTR, which improves protein fold recognition. Furthermore, we propose a new ensemble DL method, termed FSD_XGBoost, which combines protein fold embedding with the other two discriminative fold-specific features extracted by two DL-based methods SSAfold and DeepFR. The Top 1 sensitivity of FSD_XGBoost increases to 74.8% at the fold level, which is ~9% higher than that of the state-of-the-art method. Together, the results suggest that fold-specific features extracted by different DL methods complement with each other, and their combination can further improve fold recognition at the fold level. The implemented web server of FoldTR and benchmark datasets are publicly available at http://csbio.njust.edu.cn/bioinf/foldtr/.

Protein Fold Recognition Based on Sparse Representation Based Classification.

Protein Fold Recognition based on Multi-view Modeling.

MLDH-Fold: Protein Fold Recognition Based on Multi-View Low-Rank Modeling

Protein Fold Recognition Based on Auto-Weighted Multi-View Graph Embedding Learning Model

Protein Fold Recognition with Support Vector Machines Fusion Network

Improved Method for Predicting Protein Fold Patterns with Ensemble Classifiers.

FoldRec-C2C: Protein Fold Recognition by Combining Cluster-to-cluster Model and Protein Similarity Network

Protein Fold Pattern Recognition Based on Ensemble Classifiers

Performing protein fold recognition by exploiting a stack convolutional neural network with the attention mechanism

Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique.

Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier.

Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition

Protein Folds Recognized by an Intelligent Predictor Based-on Evolutionary and Structural Information.

DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks

Ensemble Classifier for Protein Fold Pattern Recognition

A New Taxonomy-Based Protein Fold Recognition Approach Based on Autocross-Covariance Transformation

Fold-LTR-TCP: protein fold recognition based on triadic closure principle

Improving protein fold recognition using triplet network and ensemble deep learning

SelfAT-Fold: Protein Fold Recognition Based on Residue-Based and Motif-Based Self-Attention Networks

A FAST AND EFFECTIVE APPROACH OF FOLD RECOGNITION BASED ON IMAGE FEATURE

ReFold-MAP: Protein Remote Homology Detection and Fold Recognition Based on Features Extracted from Profiles.