Abstract:Abstract As an essential task in protein structure and function prediction, protein fold recognition has attracted increasing attention. The majority of the existing machine learning-based protein fold recognition approaches strongly rely on handcrafted features, which depict the characteristics of different protein folds; however, effective feature extraction methods still represent the bottleneck for further performance improvement of protein fold recognition. As a powerful feature extractor, deep convolutional neural network (DCNN) can automatically extract discriminative features for fold recognition without human intervention, which has demonstrated an impressive performance on protein fold recognition. Despite the encouraging progress, DCNN often acts as a black box, and as such, it is challenging for users to understand what really happens in DCNN and why it works well for protein fold recognition. In this study, we explore the intrinsic mechanism of DCNN and explain why it works for protein fold recognition using a visual explanation technique. More specifically, we first trained a VGGNet-based DCNN model, termed VGGNet-FE, which can extract fold-specific features from the predicted protein residue–residue contact map for protein fold recognition. Subsequently, based on the trained VGGNet-FE, we implemented a new contact-assisted predictor, termed VGGfold, for protein fold recognition; we then visualized what features were extracted by each of the convolutional layers in VGGNet-FE using a deconvolution technique. Furthermore, we visualized the high-level semantic information, termed fold-discriminative region, of a predicted contact map from the localization map obtained from the last convolutional layer of VGGNet-FE. It is visually confirmed that VGGNet-FE could effectively extract distinct fold-discriminative regions for different types of protein folds, thereby accounting for the improved performance of VGGfold for protein fold recognition. In summary, this study is of great significance for both understanding the working principle of DCNNs in protein fold recognition and exploring the relationship between the predicted protein contact map and protein tertiary structure. This proposed visualization method is flexible and applicable to address other DCNN-based bioinformatics and computational biology questions. The online web server of VGGfold is freely available at http://csbio.njust.edu.cn/bioinf/vggfold/.

DeepFrag-k: a Fragment-Based Deep Learning Approach for Protein Fold Recognition.

Learning Structural Motif Representations for Efficient Protein Structure Search

Improved Fragment Sampling for Ab Initio Protein Structure Prediction Using Deep Neural Networks

Protein Fold Recognition From Sequences Using Convolutional and Recurrent Neural Networks

Improving protein fold recognition using triplet network and ensemble deep learning

DeepSF: deep convolutional neural network for mapping protein sequences to folds

FFF: Fragments-Guided Flexible Fitting for Building Complete Protein Structures

Protein Fold Recognition based on Multi-view Modeling.

Learning Protein Embedding to Improve Protein Fold Recognition Using Deep Metric Learning

Deep Learning of Protein Structural Classes: Any Evidence for an 'Urfold'?

DeepFoldit -- A Deep Reinforcement Learning Neural Network Folding Proteins

Distance-based protein folding powered by deep learning

Protein Fold Recognition Based on Auto-Weighted Multi-View Graph Embedding Learning Model

DeepFold: Enhancing Protein Structure Prediction through Optimized Loss Functions, Improved Template Features, and Re-optimized Energy Function

Prediction of Protein Local Structures and Folding Fragments Based on Building-Block Library

Deep Learning-Based Advances in Protein Structure Prediction

DeepDTAF: a deep learning method to predict protein–ligand binding affinity

Performing protein fold recognition by exploiting a stack convolutional neural network with the attention mechanism

Why can deep convolutional neural networks improve protein fold recognition? A visual explanation by interpretation

Structure-based, deep-learning models for protein-ligand binding affinity prediction