Abstract:Abstract As an essential task in protein structure and function prediction, protein fold recognition has attracted increasing attention. The majority of the existing machine learning-based protein fold recognition approaches strongly rely on handcrafted features, which depict the characteristics of different protein folds; however, effective feature extraction methods still represent the bottleneck for further performance improvement of protein fold recognition. As a powerful feature extractor, deep convolutional neural network (DCNN) can automatically extract discriminative features for fold recognition without human intervention, which has demonstrated an impressive performance on protein fold recognition. Despite the encouraging progress, DCNN often acts as a black box, and as such, it is challenging for users to understand what really happens in DCNN and why it works well for protein fold recognition. In this study, we explore the intrinsic mechanism of DCNN and explain why it works for protein fold recognition using a visual explanation technique. More specifically, we first trained a VGGNet-based DCNN model, termed VGGNet-FE, which can extract fold-specific features from the predicted protein residue–residue contact map for protein fold recognition. Subsequently, based on the trained VGGNet-FE, we implemented a new contact-assisted predictor, termed VGGfold, for protein fold recognition; we then visualized what features were extracted by each of the convolutional layers in VGGNet-FE using a deconvolution technique. Furthermore, we visualized the high-level semantic information, termed fold-discriminative region, of a predicted contact map from the localization map obtained from the last convolutional layer of VGGNet-FE. It is visually confirmed that VGGNet-FE could effectively extract distinct fold-discriminative regions for different types of protein folds, thereby accounting for the improved performance of VGGfold for protein fold recognition. In summary, this study is of great significance for both understanding the working principle of DCNNs in protein fold recognition and exploring the relationship between the predicted protein contact map and protein tertiary structure. This proposed visualization method is flexible and applicable to address other DCNN-based bioinformatics and computational biology questions. The online web server of VGGfold is freely available at http://csbio.njust.edu.cn/bioinf/vggfold/.

MotifCNN-fold: Protein Fold Recognition Based on Fold-Specific Features Extracted by Motif-Based Convolutional Neural Networks

SelfAT-Fold: Protein Fold Recognition Based on Residue-Based and Motif-Based Self-Attention Networks

Performing protein fold recognition by exploiting a stack convolutional neural network with the attention mechanism

Why can deep convolutional neural networks improve protein fold recognition? A visual explanation by interpretation

Protein Fold Recognition based on Multi-view Modeling.

Protein Fold Recognition From Sequences Using Convolutional and Recurrent Neural Networks

Protein Fold Recognition with Support Vector Machines Fusion Network

Learning structural motif representations for efficient protein structure search

FoldRec-C2C: Protein Fold Recognition by Combining Cluster-to-cluster Model and Protein Similarity Network

Improving protein fold recognition using triplet network and ensemble deep learning

RFRSN: Improving protein fold recognition by siamese network

DeepSF: deep convolutional neural network for mapping protein sequences to folds

DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks

MLDH-Fold: Protein Fold Recognition Based on Multi-View Low-Rank Modeling

Protein Fold Recognition Based on Sparse Representation Based Classification.

Learning Protein Embedding to Improve Protein Fold Recognition Using Deep Metric Learning

PiFold: Toward effective and efficient protein inverse folding

Protein Folds Recognized by an Intelligent Predictor Based-on Evolutionary and Structural Information.

Improved Method for Predicting Protein Fold Patterns with Ensemble Classifiers.

ReFold-MAP: Protein Remote Homology Detection and Fold Recognition Based on Features Extracted from Profiles.